Less is more.

Please put fewer things in new language designs, not more. I say this on the slim chance it takes, and simpler designs are seen. I'll try to make my point briefly; note this is also what I think a design should do: make points briefly. First I'll make a comment on Rust. Then I'll offer to argue for simplicity in a language you talk about here if you want.

My guess is Golang folks will win, because Rob Pike gets the less idea. To compete Rust folks must go small, not big, so a language lawyer can explain in simple terms and still be correct. All I want is for things not to suck. I don't have an agenda to sell Go or Rust, I just think competition is good. So I don't want Rust folks to lose.

Rust folks have a clear disadvantage when more folks are involved in design: getting complex is often how smart folks compete, even when this loses the game for a group, as individuals win prestige for themselves in isolation. In game theory terms, folks betray long term global goals to get short term local reward. (Actually, there are several incentives to be bad; some folks like a priesthood model where complexity offers future job security in the understanding bottleneck. But this isn't as big a problem as complexity smelling good right now in arguments.)

Compare with C++: it got more complex over time, and adding new complexity is going strong. Folks who want a low level language must abandon it, because committee direction is toward high level. I used C++ from 1990 to 2010, but the direction of 2011 designs showed me I need to go elsewhere when I need runtime control. Or compare with Scheme, especially when it comes to the way R6 differed from R5. (I haven't look at recent Scheme efforts.) If you want design to be simple, it must be a goal. It won't happen accidentally when more than a few hands get involved.

How do you know when it's simple enough? There's an empirical test. Write down rules, give them to a fresh newcomer of mean ability and intelligence, and see if they can apply and defend them in various cases — especially when attacked by a bright lover of complexity. Your average practitioner should be able to say, "The simple rule here dictates solution X to your wicked problem Y." You did it right when this can occur. You did it wrong when it requires an expert to adjudicate normal discussion. It's a design smell if any part requires an expert to grasp. Another design smell: you grasped it last week, but not today.

Boil things down to basic definitions where they work when applied to complex cases. If you're so smart, you should be able to do this, right? Ask others for help when you want to know if it's simple now, because your perspective is suspect. Be agreeable to other folks rewriting your rules so they get easier to learn and apply. Tease folks who try to make things harder to understand because it makes them look good. You and your children are going to end up driving a car controlled by software developed by non rocket scientists. Ease up on them.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Simple isn't enough. The

Simple isn't enough. The complaints about Go aren't over its simplicity, but about insufficient generality (particularly parametric polymorphism). Adding special cases to its monomorphic core just for collections is the antithesis of simplicity.

Pursuing generality also places simplicity at risk, which is what you are essentially complaining about. Sacrificing generality for simplicity is just as bad though.

Some well-balanced intersection intersection of these two factors would make for a successful language from a technical standpoint, although technical merits aren't the only factors needed for success.

Important vs enough.

To restate my thesis: a majority of devs using a tool should grasp its rules, be able to cite them, and handle questions via rules with clear reasoning that's usually correct. Failure to grasp, cite, or apply rules is a problem. My concern is devs 1) knowing what they're doing, 2) being able to communicate, 3) finding it easy to think, and 4) being able to stop well-poisoners. Specifically, I claim lack of simplicity is the major factor here. It would be ironic if that claim requires a complex justification.

Enough for what? Clearly most simple is nothing at all: no tool, no rules, no syntax, nothing. But this provides no value. So it's obvious simplicity alone is not a sufficient quality for usefulness. So insert my everything is a tradeoff post from last June here, and take for granted tradeoffs are always necesary, and that I value simplicity as a primary enabling feature.

I didn't criticize Go. I implied criticism of Rust. (Is a Hindley-Milner type system part of the Rust spec? Awesome, let's get a random average Joe to explain and adjudicate that in discussions.) I didn't mention type systems specfically, because that's not really the issue. Grasp of a whole spec for most devs is the issue. You can say I'm wrong, but that's what I was saying.

Clearly many other things compete with simplicity in tradeoffs. I won't bother to enumerate them. I assume you cite generality because it's good for starting an arm-wrestling contest where folks with complex explanations win. Good strategy on your part, since you can dump as much detail as needed to overwhelm a conversation with trivia folks have trouble dismissing without seeming lightweights. Can you define parametric polymorphism in few enough clear words — say 50 or 100 — that an average coder can use those rules to settle design issues? Or is that something devs should not worry their pretty little heads about?

Sacrificing generality for simplicity is just as bad though.

No, that's not an equal tradeoff. Lost generality is just a local hole, while too much lost simplicity undermines pervasively.

No, that's not an equal

No, that's not an equal tradeoff. Lost generality is just a local hole, while too much lost simplicity undermines pervasively.

I disagree. Insufficient generality impacts global program structure as well.

Not enough simplicity, and you force users to define workable subsets of your language (cf. C++ style guidelines).

Not enough generality, and you force your users into informally specified program structures that can't be auto-checked (cf. design patterns), or implement ad-hoc code reuse mechanisms (cf. copy-paste or code generation templates).

A Turing Machine is simple,

A Turing Machine is simple, and it's also general, but trying to do something big with it is not simple. Hence my own emphasis on abstractive facilities; the trick is to start out with a simple, general language, and provide for building your code base by means that tend to preserve the simplicity and generality. (My suggestion for how to do that is embodied in the Smoothness Conjecture in my dissertation, smoothness being the property that the abstractive facilities apply to the base lanuage in a free and uniform way.)

By "generality", I'm

By "generality", I'm referring to expressiveness properties (close to your abstractive facilities). Hence my example about Go and its lack of parametric polymorphism.

I'm not sure I'd refer to Turing machines as "simple" in the sense McCusker means, although they are certainly minimal. Simple to humans is not the same as formal simplicity.

An analogy I've

An analogy I've considered:

Expressiveness (in more-or-less the sense of Felleisen) is the derivative of semantics, in that it concerns how semantics moves when changing the language. In that sense, abstractiveness is the derivative of expressiveness, in that it concerns how expressiveness moves when changing the language.

More or less. The analogy is slippery, but has some mathemtical justification in my one attempt (so far) to formally define abstractive power.

So, when I talk about abstractive power, I'm trying to get at something related to, but different than, expressive power.

Clearly many other things

Clearly many other things compete with simplicity in tradeoffs. I won't bother to enumerate them. I assume you cite generality because it's good for starting an arm-wrestling contest where folks with complex explanations win.

Or perhaps there's merit to some of those contests which you denounce? In fact, the argument is simple: bug counts correlate with code length. Reduce code length, reduce bugs.

More generality implies less code due to reuse, and thus, fewer bugs. No complex explanations or trivia required. The general code also gets wider user and thus more testing, thus further improving robustness.

Of course, some other measures of complexity also correlate with bug counts, and so conciseness isn't the ultimate metric when generality requires complex acrobatics. But let's not pretend that your specific simplicity metric somehow trumps all other properties.

Can you define parametric polymorphism in few enough clear words — say 50 or 100 — that an average coder can use those rules to settle design issues?

Parametrically polymorphic code can be safely used with any type (where "safe" is a compile-time guarantee, not a runtime assertion).

As a design guideline: if your algorithm doesn't depend on the specific type of the value, make it a polymorphic parameter so it can be reused in as many contexts as possible. Simple.

Why am I doing this on a day off again?

The sort of generality I dislike most is encyclopedic, itemizing a solution for every problem anyone ever considered, without any common issue elimination, so nothing is resolved. To solve your problem, you might have to wade into every problem to find yours. But another kind of generality finds and removes gratuitous duplication, so there's less piled up afterward. So generality can increase simplicity — but when you brought generality up, you contrasted it with simplicity in a tradeoff, when one or the other must be chosen without getting both. When you get both, I say terrific. But if simplicity must actually be sacrificed, you can only lose so much before loss gets nonlinear with respect to other gains.

(I'm very familiar with complexity caused by being told, "Make this go faster, no matter how complex it gets. Yes, you can make it twenty times as complex as long as it goes 5% faster." That's a tradeoff I understand, even if it makes me queasy.)

Reduce code length, reduce bugs.

Generality that reduces code length is a good trend because it's simpler. That's assuming complexity didn't just get folded into another place where you're going to encounter it anyway. Suppose Max re-arranges code in a maximally clever way, so when Jim uses it, normal use is smaller and simpler which Jim likes ... up until Jim finds a bug and must understand what happened. If Max made it easy to understand and debug, we still have a win. (Give Max a raise.) But if Max did something that looks ten (or a hundred) times more complex to Jim than if Max had never done anything, this is bad. Everything else being equal, shorter is better.

However, many ways of making things shorter cause another task to get harder. Compression can be cryptic. And indirection can make code shorter, but make tracking down what happened later harder. Have you ever seen a backtrace after a crash that contains a symbol that appears nowhere in the code? Obviously I mean macros that generate symbols, a practice I dislike a lot, or templates of another sort. Re-arranging code cleverly must include a simple algorithm for where to look when something goes wrong, or else shorter is not really simpler. Jim should be able to ask 1) where is the rule? 2) where is the code? 3) where is the test? 4) where is the tracing or logging? 5) where is the nuance handling my type's little odd details?

I get that you're saying parametric polymorphism can eliminate duplicate code, and this is good. In a vacuum, we can't tell whether this is good until we tally up any other cost that went up, if any. Right away we see there's extra cost if Jim doesn't know the answer when Joe asks, "What happened here? What does this construct mean? Why did my constructors get called in this order? How did you get this memory when I require a special purpose allocator? How did the build break in the unit test when it got to my code? Who wrote this one function I see calling mine in this core dump?" I skipped the funny whining about being in the office on a holiday. You know what I mean.

Parametrically polymorphic code can be safely used with any type ...

What if Jim suspects his code is safe that way, but discovers he was wrong the hard way by runtime failure. What does Jim do next?

I appreciate you humoring me by writing with such excellent brevity. Your descriptions of the effect of parametric polymorphism is nice and short. And it's definitely simpler when it gets shorter, and nothing else gets more complex. But I was wondering if the definition of what it is can be short when written in terms Jim uses to explain to Joe what went wrong.

The sort of generality

The sort of generality I dislike most is encyclopedic, itemizing a solution for every problem anyone ever considered, without any common issue elimination, so nothing is resolved.

Sounds familiar from somewhere. How about this:

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.   — RnRS, for n ≥ 3.

Although, admittedly, I felt the Scheme reports abrogated the right to claim that philosophy when they added hygienic macros.

Generality through minimized assumptions vs. cases

I've noticed the same distinction as you between two things that people call "generalization". The bad kind usually involves explicitly adding extra code to handle the general case instead of making simplifying assumptions. The good kind of generality, though, involves hardcoding fewer assumptions into the code that you needed to write anyway, such that generality comes "for free". Parametric polymorphism, at least when it can be inferred, is an example of this (or at least of getting generalization on the cheap).

minimizing assumptions

The good kind of generality, though, involves hardcoding fewer assumptions into the code that you needed to write anyway, such that generality comes "for free".

The part I like is identifying what must be assumed, then trimming it down. Does phrase "generic methods" mostly encompass what is meant when folks talk about parametric polymorphism? (A counter example would be a good way to say no.) Fewer syllables in "generics" is much better.

If I ever implemented generics, I'd carefully avoid words parametric polymorphism in a spec because (I see from looking at Wikipedia) that well is already poisoned, if you want brevity and clarity an average dev has no trouble parsing. Then if someone else said it was an identical concept, I could ignore that and keep using my short spec with no pointers to type theoretic analysis.

As a tactic, I like defining local simple rules instead of re-using global complex rules. Then if simple rules have a clear relation to complex ones, taking a long explanation to convey, this can be ignored by all but those who care. Is that possible in case of this feature?

terminology appropriate to the audience

If you are writing a spec for devs to read the usual thing is to talk about generic or polymorphic functions and then use that name for the rest of the document.
Language designers talk about parametric and ad hoc polymorphism because it is a useful distinction to make when designing a type system. However once a specific language has polymorphic functions they are called "generics" because the developers need to know how they work, not how they could have been designed differently.

For the same reason when you buy a hammer, you are told about the weight and size not whether it is made of T50 or T70 steel. But for tool designers the grade of steel matters so they use the appropriate terminology.
I think we agree that it makes sense to not talk about parametric polymorphism in your documents for developers, but that it does make sense to use the term here, where we analyze the design of programming languages.

re: appropriate terminology

That's a great answer to my question, and use of plain language is much appreciated.

... but that it does make sense to use the term here, where we analyze the design of programming languages.

That seems fitting. We may also do something else besides analysis, but it's hard to characterize what it is. I try to say when models may puzzle devs I know, because code they write scares me when they proceed without understanding. I've had to revise my idea of what is clear enough in docs downward over time.

but when you brought

but when you brought generality up, you contrasted it with simplicity in a tradeoff, when one or the other must be chosen without getting both.

I don't object to anything you said except this line. I didn't introduce the tension between simplicity and generality, I'd argue you (implicitly) did when you claimed that Pike "gets it", and cited Rust and C++ as examples that do not "get it".

Pike intentionally sacrificed parametric polymorphism on the alter of simplicity. If we're to use Go as a good case study for simplicity, we should also admit Go as a good case study of frustrating non-generality.

I tried to say from the outset that a simplicity/generality tradeoff has at least one maxima, but that they're not the only two properties requiring such balance. Language expressiveness and utility is a multivariable function with many local maxima. Tweaking these subtle variables to achieve a good balance of simplicity and expressive power is still an art.

Sorry I put you on the wrong poker hand.

Ah, I see where you got that: an indirect chain of implication where my saying Pike gets "less" might mean I approve of everything he ever said or decided about Go. We should tighten up the scope, and not permit as much indirection that expands context to include a lot more than we say ourselves locally. Otherwise to say something, we end up individually responsible for what was said anywhere else too, by anyone, and not just what we say now ourselves.

I like that Pike seems to assume features are excluded by default, unless explicitly chosen for inclusion for good reason. This is an aspect of getting what "less" means: reject until a good case is made for value by inclusion. Or similarly, reject at first to reduce scope; but that's more important for small projects. Less is a good design esthetic. (Note I'm not sucking up to Pike; I've no desire to work at Google.)

Pike intentionally sacrificed parametric polymorphism on the alter of simplicity.

Well, after my last paragraph, I just take this as a "reject by default" outcome as opposed to condemnation of parametric polymorphism. Maybe he did blast it, though, I don't keep track of everyone's position like a score.

The following is an exploration of attitude I see in some C programmers, not a statement about personal perspective, okay? Polymorphism itself can be seen as obfuscation, when you have to figure out type in order to know what code runs when a call occurs. I think Torvalds cited that as a bad quality of C++, that method names are not unique as in C, as it affects static apprehension of code in a coder. (I don't want to discuss what Torvalds thinks, or defend anything he said, I'm just making my comment concrete by pointing out one example. Here's a case where it's important not to transitively expand my statements to include everything said by anyone I mention.) Anyone with that attitude sees lack of unique function names as cost, and therefore less simple. If you write polymorphic code that works on different types because the operations use the same symbols, this looks complex to anyone who now sees ongoing disambiguation of those symbols as a problem.

There's a reason I often describe viewpoints other folks hold without either agreeing or disagreeing with them. I don't require a unified world view. I seldom agree with people, and yet I want to understand, so if I refused to grasp something until I agreed, I wouldn't know much when busy rejecting things outside my current set of opinions. I think curating a consistent world of the "correct" views to hold is a waste of time because convergence is the exception and excludes too much. I suspect a desire to curate an approved list of "best practice views" (repels me to say that) is what motivates folks to include everything in the world in tech discussions, as if magically everything is going to converge at the end, instead of pruning early proactively with the knowledge everything is not going to fit. Kitchen sink designs come from an optimistic hope everything can probably fit without causing a problem.

Go has not-so-simple parts

Go has not-so-simple parts too. One wonders why they thought defer/panic/recover was a good feature:
http://jeapostrophe.github.io/2013-09-23-dpr-post.html

Simplicity is nice to call for, and it might encourage people to think before introducing /accidental/ complexity, but there is /inherent/ complexity in a great many real-world problems. A language that allows you to dig down into the messy details to attack inherent complexity can either be considered complex or impoverished, depending on what its normal mode of expression is. It would be considered complex if you mostly code at a high level, but it has additional features to go lower level. It would be considered impoverished if you always have to stay lower level.

Extensible languages like Racket allow users to create their own languages as libraries, to make certain modes of expression high-level to a domain-specific point - much higher level than the base language, which is already high-level.
Going lower level means using unsafe primitives exposed by the runtime to do complex manipulations. That means the runtime has to be well-equipped to provide low level interactions, which is not (and arguably should not be) user-extensible.

Is a language that provides (say in its standard distribution as libraries) 50 different domain-specific features a kitchen-sink? It probably would be considered that if it were [in deep voice] part of the language [/deep voice], but as libraries maybe not so much. What if the language implementors were clever and bootstrapped the implementation so that those features were written in the language itself? Maybe they use the library functionality? As a developer, I like not having to reinvent everything - I'd like to have them available to use. If they're part of the language or a standard distribution of libraries... what does the difference matter? I like my kitchen sink. It has a hose faucet for easy rinsing. It's much better than heavy buckets of well water and a ratty wash cloth (or in a pinch, a T-shirt). I don't always have to use the hose or garbage disposal, but it's nice to know they're there if I need them.

What would Forrest Gump say about simple?

I mainly want to address your question about languages versus libraries. But I can also suggest what seems simple and complex to me, as a matter of personal taste, in case this helps you see how features like Go's defer-panic-and-recover strike me. This necessarily has a shades-of-gray quality to it. Simple is a relative concept compared to other things that seem more complex in direct comparison. (I really shouldn't do this, but it doesn't hurt as long as I know I won't follow this up with more discussion about what is simple, and what's complex. This is all my contribution here for now.)

A language can be complex in syntax, semantics, runtime definition, quantity of control flow features, special edge cases, standard libraries, and other sorts of thing fitting in none of those categories. The more detail a dev must know, in more places, requiring more words to describe, with more undefined behavior and more potential for self-contradiction in conflicting detail, the more complex a language is. One operative word here is more, and another—not spelled out already above—is messy: lack of clean delineation of edges between between details of either syntax or semantics. If syntax allows you to legally express something that seems nonsense or confusing, or if semantics don't have bounds between where one idea applies and another, the result is mess weighing on a developer's mind.

Descriptions of defer, panic, and recover in Go don't look like many words to me, and they don't seem hard to understand. It's okay for a language to have semantic features that can't be implemented using the other primitives in a language. If they constitute magic to a dev who doesn't know assembler, this doesn't make them complex, as long as the rules are easy to follow and most of the time you need not think about them much, and as long as there aren't very many magic effects in total. (Weirdly conflicting magic is bad; consistent magic is okay.) I actually thought those three features were expressed in cleverly few words compared to what you might say about exceptions and unwind-protect in other languages.

Bad cases of complexity can take hundreds of words to explain and still leave you asking "what does that even mean?" at the end even so. (For example, there's at least one in Scheme, for most folks, despite it being over-all a pretty simple language.)

Languages are not responsible for making semantics of computing systems simpler than they are. A developer needs a basic grasp of what an operating system does, how processes work, how code executes generally, when things start and stop, how to manage resource limits, and other basic mechanics of cause and effect in system behavior, along with typical timing and order. As long as a language presents details in a relatively straightforward way, without undue obfuscation and incantation games, this is pretty simple as things go. For example, if threads are supported, this is not complex in a language since concurrency is a fact in most systems, and thus hiding it would be obfuscation. But if you make handling threads hideously difficult and confusing, so a dev cannot do it reliably and still must think about it all the time, that would be complex.

When rules are short, crisp, clear, direct, intuitive, non-interfering, and a few in number, then a developer sees what seems a simple model when only those rules applying to each situation demand attention. The opposite of all those lends to complexity.

Is a language that provides (say in its standard distribution as libraries) 50 different domain-specific features a kitchen-sink?

The kitchen-sink metaphor just means a big pile of unrelated crap sintered together in a jumble, but doesn't sound as vulgar.

A language should be considered separate from it's libraries, which should have another name. Smalltalk the language is not the same as Smalltalk the operating system and runtime usually bundled with it. They should have different names. Java the language is not the same as Java the mega distribution of associated libraries. They should have different names; using just one name is marketing.

If you manage to use a name and mean only a language and not it's libraries, it's not a kitchen sink design unless it's big and complex all by itself even without any libraries. But if you include libraries with a name, then it's already a kitchen sink, yes. If a language can't be used by itself, and must be used with it's big bundle of libraries, then it's not a separate thing and it's a kitchen-sink, yes.

(When I describe stuff, I plan to use a lot of names for separable parts instead of just one name, and this may be irritating. Even if there's some unifying thing to how names are chosen, I think most people won't like that much. A big blob of things grouped as a cohesive app that does something coherent, using those pieces, ought to have its own name. But whole and parts are different things.)

     "The usual phrase is: everything but the kitchen sink," Dex reminded accusingly.
     "Right. Help me out, Ivy," Wil said, then cleared his throat. "Gosh, your library has everything but the kitchen sink."
     Ivy smiled and replied, "It has the kitchen sink, too."
     "See," Wil turned to Dex. "Ivy has a kitchen sink design."
     "Okay, fine," Dex fumed.

Do language conceptually depends on the computers they run on?

Languages are not responsible for making semantics of computing systems simpler than they are. A developer needs a basic grasp of what an operating system does, how processes work, how code executes generally, when things start and stop, how to manage resource limits, and other basic mechanics of cause and effect in system behavior, along with typical timing and order. As long as a language presents details in a relatively straightforward way, without undue obfuscation and incantation games, this is pretty simple as things go.

This is an interesting opinion that I disagree with. The model of execution of the programming languages I use most (statically typed functional programming languages) is not explained in term of the machines they run on, but much simpler (and, I think, more elegant) notions of computation, in particular reduction of syntactic programs into syntactic programs (or other related choices).

This allows me to explain things like, for example, tail recursion, without leaving the comfortable world of program syntax and simple-to-explain notions of computation. I like it. This point has been driven home in great details by Robert Harper on his blog.

This gives me two things:
- simpler conceptual models to use that today's machines that are extremely complex (you may choose an idealized model of them, but then you have to justify your simplifying choices and you in fact run against the view that languages should expose the machine model)
- the confidence that I'm not teaching technology, but science. If the machine change in the following decades, and they've changed a lot since the lambda-calculus was first invented in 1930, my language will still remain a nice tool to use (given its conceptual model can still be mapped to those new machines). There may exist new, different means of expressing computations that are closer to those new machines' model, and thus more easy to make efficient (I heard people are trying to have part of the program running in parallel mutate the same parts of memory; that seems crazy, do you think that will work in the future?), but I think I can accept this.

I'm not sure I agree.

I find that I easily understand explanations that are based on the underlying machine (the standard model with stack, heap, pointers, etc.) while explanations that make no reference to the machine tend to be baffling. For me, the fact that an instructor of Haskell typically makes no reference to any actual machine makes it harder to understand, not easier. Some abstract mathematical model may be necessary for teaching Haskell, but I'm simply not a "math guy".

I know how the machine works very well, and since all professional programmers need to know roughly how the machine works, it makes sense to teach people how the machine works. Now, it's a core pedagogical principle that we learn new things by relating the new things to things we already know. In general, students either know how the machine works or they should be taught. So it makes sense to me to teach how programming languages work with respect to an approximate model of the machine. Abstract mathematical explanations, on the other hand, are often harder to learn because they do not relate to something the learner already understands. Not all of us have a talent for "math speak", and even if you understand the math-speak, I still think it can be a useful "anchor" for you to know how the code runs on the physical machine.

After all, different computer languages have vastly differing semantics, and in many cases you won't find an explanation of a given language's semantics in terms of mathematical first principles. But what all languages have in common is that they run on a computer. Therefore, knowing how a language operates in terms of the machine provides a basis for comparing and relating different languages in a student's mind.

That said, the Turing machine has never felt like a useful model to me. I mean, ticker tape, come on! Not only would a real turing machine be fantastically large and slow, its memory model isn't random-access in the way real memory (rougly) is. Since it's lower-level than a real machine, it's hard to relate a turing machine to real-life code or to a familiar real-life computer. The lambda calculus, at least, looks something like code, and can almost be used directly.

Anchors can hold you back...

Dijkstra once famously said: It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

I imagine that similar applies to people who first learn CPUs, pointers, and stacks. If our first exposure serves as an 'anchor' for everything else we learn, i.e. because we frame new ideas in terms of old ones, it seems wise to seek a better anchor - one that can be easily extended in small steps to understand different concepts.

I'm not sure lambda calculus is an ideal basis. The idea of 'substitution' seems awkward for explaining the broader range of communication principles, IMO. But there are a number of other interesting possibilities: term rewriting, combinators, grammars, intuitionistic logic, etc..

"I know how the machine works very well"

Really? I'm a professional programmer, and have been for 25 years, and I certainly don't claim to know how "the machine" works very well. I've got software running in production literally this very minute on 1000-node map-reduce clusters, virtualized cloud servers, developer desktops (you might be a user of mine!), and smartphones. These almost all target the same instruction set (JVM), but the actual machines involved couldn't be more different. Moreover, they are different in ways that I don't know, don't particularly want to know, and (when it comes right down to it) often _can't_ know.

Most developers who claim to know how the machine works and to love programming close to it actually have a solid mental model of the PDP-11. That's the last physical hardware that was really particularly close to the Von Neumann model. I think they sold the last one around the time I started high school. All of the machines that I'm actually shipping on go out of their way to kind of _fake_ being PDP-11s (but better!), using ridiculous amounts of technology to do so. However, if you look under the covers the actual hardware is an insanely long way away from doing anything like what a PDP-11 ever did, in much the same way an F-35 fighter plane is a long way from being a Model T. Built-in threading, virtualization, pipelining, hardware virtual memory, instruction reordering, vector operations, multi-level caches, offloading computations to GPUs or even clusters! All those wonderful tricks that those hardware guys have come up with to make _sand_ fake being faster and faster PDP-11s! I try to keep up, but in practice as a working programmer I don't, and I can damn well guarantee that my peers don't.

If you can manage all of that hardware knowledge and still keep up with the demands of shipping code, I can only say I'm extremely impressed, and would love a resume' when next you are on the market. For myself, I will humbly admit that all of my mental models of computation are actually virtual, and see where that takes me.

(That came out snarkier than I intended, both for better and for worse. Assume I wasn't trying to actually insult you.)

Languages are not

Languages are not responsible for making semantics of computing systems simpler than they are.

That's an odd way of saying it.

For obvious reasons, languages cannot diminish essential complexity associated with a problem. Valid concerns - such as resource management - should be addressed. But I don't see much reason to expose the considerable accidental complexity of Von Neumann architectures, unsafe machine codes, signals, and poorly designed concurrency models.

Will you be having tea in the drawing room, sir?

(I didn't want to follow up on this, but you said that so reasonably.)

That's an odd way of saying it.

Good eye. Yes, I agree I said it oddly. That sentence was more problematic and edgy than others. I wanted to skip over several questions along the lines of, "How much simpler is a language obligated to make things?" But I didn't want to bloat prose with fussy provisos and dry exactitude running to lots of words. My statement is broken without "responsible for" and "semantics". A language can make things simpler—that sounds good. But when it can't hide semantics, I don't see how that automatically makes it complex.

By semantics I meant broad strokes of meaningful general organization, as opposed to endless bookkeeping detail attendant upon managing specific organization. It's pointless to hide semantic complexity of "memory exists", but a good idea to hide yak shaving detail like "exactly how this gets laid out and where."

I meant specifically to argue the opposite of this statement: "Unless a language hides every sort of complex semantic in computing systems, at all levels, then it loses and is complex because a simple language is obligated to hide everything beyond admitting some computation occurs."

If a tool is like a servant, you don't want aggressive opaqueness, like a butler who refuses to admit whether guests are somewhere about when you ask if they got settled into their quarters. A position of "I can neither confirm nor deny whether a Von Neumann machine has been seen around here" is just obstructive, and obligates a dev to use additional tools to get answers.

strange dichotomy

You seem to be arguing a dichotomy between hiding complex semantics vs. exposing machine details. There is a third option: to expose a rich semantics that is distinct from the host machine(s) - e.g. via bytecode or abstract machine - then compile or define the language relative to this target. Then introduce a separate translation layer (compiler, JIT, or interpreter) between the abstract machine and the hosting environment.

I do not see hiding whether a Von Neumann machine is present to be 'obstructive'. How is such a question or answer useful? It seems potentially more harmful, in the same sense that ad-hoc introspection or reflection may hinder reasoning about subprogram behavior.

Familiarity

McCusker got a point when he mentions that it's easier to transition between languages that use a common concept of computation, and that there is a relatively-widely-accepted consensus on what the "ideal machine model" is for some of the current mainstream languages (roughly: what you needed to know when you used C, watered down a bit).

To take a concrete examples, I never teach ML references in term of pointers. I simply describe them as "a box" with identity, where we can read or put a value. But if a student comes with a question and rephrases one of my explanations in terms of pointers, I will not correct him, and can even continue using this unnecessarily-low-level model for the rest of our discussion.

Marketing to dev mental models

I mainly aim to say the following is a good idea. Next I describe a dev-cognitive-model mainly so you can make the cited good idea succeed via tooling superior to a dev's inferior tools of last resort.

There is a third option: to expose a rich semantics that is distinct from the host machine(s) - e.g. via bytecode or abstract machine ...

Abstractions have pros and cons (like most things), and benefits and costs might disproportinately go to folks with different roles. From abstraction, a producer might get more benefit while a consumer gets more cost. For example, a dev as consumer might be forced to debug in code after translation from nice abstraction to concrete host env. This seems likely as opposed to unlikely, unless tooling is so superb that debugging in assembler always seems superfluous.

I'm used to there being erasure of important semantic information in translations to final concrete form. A dev is likely to put most trust in whichever tool can be instrumented to give the most (apparently) reliable evidence about runtime behavior. If this isn't the high level tool, then it is held in suspicion with a guilty-until-proven-innocent flavor. Here abstraction is judged obstructive when a lower level informs better, and when mapping between levels is hard and (via evidence) contradictory.

I know a lot of devs who view high levels as BS when they can see a lower level. Telling them they don't need the lower level might evoke laughter. The following is slightly related, though less extreme in model translation.

I get that sort of reaction in conversation with C devs when I talk about a transpiler from normal synchronous C (SC) to asynchronous C (AC) targeting a fiber library. Ned with 30 years of professional programming experience (yes really) says he has seen this before, and the original tool before translation always gets abandoned after folks edit the output and cannot go backwards to original source. I say you should be able to get anything you want to appear in AC as if you edited it by hand, using a "do exactly what I say" mechanism in SC. But Ned just says good luck getting folks to follow rules.

debugging down under

Your advice seems reasonable. But it is unclear to me how well it holds in this world where JIT compilers have become so popular. Are you accounting for the relative difficulty of operating at the assembly layer?

I'm used to there being erasure of important semantic information in translations to final concrete form.

I'm most interested in designs that maintain semantic information as long as possible. Microsoft's CLR offers a better example of this than Java bytecode, maintaining a great deal of type information in the assembly. Recently, those clever F* developers at Microsoft Research have also achieved full abstraction when compiling to JavaScript, which I find very impressive given the security goals of F*.

The bytecode I'm developing doesn't express types directly, but will enable strong static type inference for a relatively expressive type system. I'm trying a non-conventional idea of 'flipping' a language: I typecheck after linking and compiling to my bytecode.

G'day

(The expert assembler guy in my group is Australian, so I found your title funny.) The tone of my reply here is intended random/free-associating discussion.

But it is unclear to me how well it holds in this world where JIT compilers have become so popular.

I lack data there since I have not developed where JIT is used, and none of my coworkers appear to have worked where JIT is used, so they tell no stories. In addition, I don't hear stories (I'm interested so I would notice) about debugging that layer when I see folks online discuss code written where JIT is used. How do folks profile accurately when the assembler might vary from run to run? Can it thus vary?

Judging from a lack of evidence is dubious. But absence of signal from folks debugging JIT output is interesting enough to conjecture either 1) JIT code is always perfect or, 2) folks don't solve a problem serious enough to require debugging, or 3) debugging this is avoided anyway, even if it's serious, because (insert personality fault of dev here). In short I'm guessing. But I would enjoy hearing any story like this folks have to tell. :-)

The role of guessing in development seems rarely discussed. I have a standard lecture I give fellow devs frequently — almost every time I'm asked why XYZ didn't happen when code was tested before checkin. I say they are debugging a negative — that something didn't happen — so it's hard to reason about when there's no evidence. What they need is evidence about what happened instead. Something along the chain of requirements leading up to XYZ fell through, so go find out which one. "No one got back to me," for one of the parts is the same problem, though. I especially hate profiling latency due to "no one called our part of the system" with no hot spots in the code that's just waiting.

Guessing comes up often in low level debugging, starting with a stack backtrace and a core file. What could have caused this observed end to our process? (What does this assert down inside free() mean? Your heap is corrupt.) Hypothesis generation for a plan to gather evidence seems to involve Bayesian reasoning, pattern matching with past experience, and creative thinking.

Are you accounting for the relative difficulty of operating at the assembly layer?

Probably not. I only ever understood 68K assembler well, since I spent three years stepping through 68K asm under Macsbug (technically it was named Opusbug because the microkernel was named Opus after a cartoon penguin in Bloom County) while mentally mapping this back to the original C++ in the very early 90's. I don't try to understand Intel assembler. If I suspect code generation, I ask someone else if it's right. (I would not recommend using C struct bitfields that are 8 bits wide in some old versions of gcc, because the C compiler can get refreshes wrong; apparently byte-oriented instruction optimizations can confuse it.) A reason to suspect code generation is when you look at registers/memory and say, "This value cannot possibly end up here from that sequence of statements." It's almost the only reason we look. Threads are another usual suspect when it's not thread-local storage.

The sense of your comment might be, "Debugging the assembler layer is hard." Yes, which is why it's important that not be necessary. Each time you go down to a lower layer, debugging gets much harder, so costs can be greatly magnified. Saving time of X in developing a high layer might cost you time of C*X for some constant C in a low layer, even if the problem happens relatively rarely. If the cost of debugging a tool exceeds it's value, and there's an alternative, the tool is usually dead as soon as devs figure it out, unless they're already in too far to get out (which is admittedly common).

specialisation vs generalisation

I would call the situation where you have different code that runs depending on the type of an argument "specialisation", whether that occurs by type-case, overloading or sub-classing. This is because code specialised to the type in question is executed, which can have different behaviour depending on the type. Contrast this with generalisation, where the same code runs no matter which type is the argument, whether this happens through polymorphism or a base class.

Sometimes these are linked, like in addition where the specialisation of the addition operator allows it to be used generically on say floats and ints, and sometimes not, like a generic container that does not need to know anything about its contents, apart from some universal property like object size.

Dispatch

To me specialization means taking a generic semantics, and specializing it to a specific use case where it can be optimized (without changing the semantics). For example a sort function may be specialized to the case of integers and their usual increasing order; it will essentially be a partial evaluation of the generic algorithm with some known inputs.

I would rather use the name "dispatch" to express the fact that, depending on the type, distinct (and possibly "incompatible") semantics may be chosen. Dispatch may be dynamic (as in the case of subtype polymorphism, or type-case on the dynamic type representation) or static (overloading or type-classes), but I think it carries the meaning that outputs of distinct inputs are unrelated better than "specialization".

Are they not the same thing?

Are they not the same thing? The compiler optimising the sort routine is choosing specialised implementations based on the data types involved, and the concepts (as in Stepanov) for example that integer is ordered. This may be implemented in a library providing the overloaded functions or internally in the compiler.

Parametric vs ad hoc?

This sounds like the distinction I understood Strachey to be getting at with his distinction between "ad hoc" and "parametric" polymorphism. Parametric gives a rule for doing things that applies to different types, whereas ad hoc gives entirely separate rules for doing things for different types.

Indeed

I understand this sub-discussion to be about vocabulary; the words "parametric polymorphism" come from greek, and "ad hoc" is latin, people say that sounds too complicated for the working programmer. I'm fine with "generics" as a less scary replacement for "parametric polymorphism", and I think "dispatch" would be reasonable for "ad hoc"; its etymology is distant enough for it to sound pleasingly american.

Ad hoc polymorphism

Ad hoc polymorphism corresponds to overloading, not dispatch. What you are referring to is, what I thought, called "polymorphic dispatch," where we use adjective form as in "polymorphic recursion."

Overloading is dispatch

Overloading is type-based dispatch. Depending on how powerful your language is it can be purely static or it can be full-fledged dynamic. Haskell is an example of the latter.

The separation of (static) overloading from (dynamic) dispatch, as seen in many mainstream languages, is more an artefact of OO limitations than a natural phenomenon.

I've never heard of any one

I've never heard of any one refer to dynamic dispatch as ad hoc polymorphism before. I hate words that are defined to mean everything, is that what they mean by "ad hoc?"

Inclusion, not equivalence

Ah, I didn't say that "dispatch" is ad-hoc polymorphism, only that ad-hoc polymorphism is (a form of) dispatch. There are other forms of dynamic dispatch, including, for example, plain first-class functions. In general, the notion of dispatch does neither imply nor require polymorphism, though many people seem to confuse the two.

There are other forms of

There are other forms of dynamic dispatch, including, for example, plain first-class functions.

Ah, but this is just ad hoc polymorphism over the type of function activation records. ;-)

erudition

Well, the terms are rather erudite, yes; unsurprisingly, because Strachey had a very strong liberal arts background (how could you not, coming from that family?). I always figured he chose the term "ad hoc" with concious or unconcious intent to denegrate; specifying a rule that works for all types is so much more elegant than specifying a separate rule for each type, after all. (And this is the same guy whose 1967 lecture notes say "basic irrelevance of syntax and primacy of semantics".)

Strachey's terms

I was told that he was responsible for popularising the term "Currying". Such a great shame, "Schönfinkeling" just sounds so much better.

Now that you mention it,

Now that you mention it, Schönfinkeling has a certain appeal :-D. But currying brings to mind —well, I'm guessing I'm not alone in this— the use of a stiff-bristled brush to untangle a horse's hair, which has always struck me as a somehow fitting analogy (one of those odd eponymic coincidences, like the Shell Short).

Generic functions

At least Common Lisp, Dylan, and Factor [and probably G'Caml] use generic to mean dispatch though.

Definition of 'generic'

Don't know who observed it originally, but I always found this definition quite accurate (paraphrased):

"'Generic' always refers to those forms of polymorphism that a person's favourite language does not (yet) support." (...or has acquired late, one might add today.)

Too true. No doubt the use

Too true. No doubt the use of "generics" for "polytypism" in Haskell causes some confusion because it's completely at odds with generics in mainstream OO languages.

Separate concepts

It is worth separating these concepts:

  • different user code is reachable based on input type (dispatch)
  • compiler emits code optimized for a given use-case (specialization)

These are related in the sense that the former is often an opportunity for the latter. When specializing a sort, one might inline a comparison function and sizeof() values that are selected based on argument types. But we shouldn't conflate them.

What about a library that

What about a library that emits different code optimized for a given use case? Consider linear maths libraries that use meta programming to emit optimised matrix calculations. I think compilers need to be an enabling platform so that domain experts can write optimising libraries for their area if expertise, which can be used by other programmers. In this case the only distinction between the first statement and the second is the point of view of the programmer. The first is something this programmer wrote, the second is the same thing but written by someone else. Maybe optimisation does not belong in the compiler at all?

specialization by hand

If programmers develop multiple implementations of the same function/behavior, and select between them based on the arguments for performance purposes - e.g. switching from merge sort to linear sort based on input size - this could be considered a form of specialization on a function, implemented by hand.

But specialization is still not the same (in concept or utility) as dispatch. Specialization involves multiple implementations of ONE function, and it is difficult to find a motivation for this other than performance. Dispatch involves selecting between functions/behaviors which may have different properties (modulo a few laws or invariants) in context of different types.

I agree with the general sentiment that developers should be able to manage performance in a language. And meta programming can certainly aide in this endeavor.

Maybe optimisation does not belong in the compiler at all?

Optimization will always be part of the compiler, if only because compilation occurs in multiple stages with intermediate languages, and each stage is opportunity for more optimizations. Programmers are only able to manipulate one surface, and it is difficult to perform deep optimizations by hand. (And expensive to maintain.)

Unorganized thoughts

Specialization involves multiple implementations of ONE function, and it is difficult to find a motivation for this other than performance.

Sounds like pattern matching:

len [] = 0 -- specialize to empty case
len x:xs = 1 + len xs -- specialize to non-empty case

Why is that wrong usage? How do C++ template specializations fit in? Those are also certainly used for non-optimization purposes.

Optimization will always be part of the compiler, if only because compilation occurs in multiple stages with intermediate languages, and each stage is opportunity for more optimizations.

Agreed, but that doesn't mean optimizations shouldn't be in userland. The compiler shouldn't be such a black box.

Template specialisation is type indexing

Templates generally corresponds to parametric polymorphism (albeit poorly). Template specialisation, then, corresponds to type-indexed definitions. (At least if you specialise type parameters. When you specialise value parameters, it's value-indexing, which is somewhat reminiscent of pattern matching, though on the type level.)

Pattern matching isn't

Pattern matching isn't specialization. If you separated the pattern cases, you wouldn't have even one complete function, much less multiple implementations of a single function. Statically eliminating unnecessary patterns based on contextual knowledge of the possible inputs, however, would be a case of specialization.

Metaprogramming, including template metaprogramming, can be used for a number of roles other than specialization. It's reasonable to call it "template specialization" when you, for example, implement a bit-vector for booleans with the same interface and behavior as a more general vector class instantiated for booleans... but a more compact representation. (This was the original intention for "template specialization", hence the name.) But if you implement fibonacci function in templates, they simply aren't being used for specialization.

I've not argued against userland optimizations.

Agree

Here's my interpretation what you're saying: specialization is generating code for a special case of a general pattern. This generally isn't a language feature. The language feature that enables manual specialization is branching on values and/or types. A language implementation might also offer automatic specialization for purposes of optimization.

The main point I was hinting at is that specialization as you're using the term (which seems reasonable to me) isn't a language feature.

Language Features for Manual Specialization

I've studied a few languages designed for manual specialization. Granted, they're somewhat esoteric, such as Joseph Goguen's BOBJ.

Manual specialization expressed in terms of tweaking/re-implementing the model introduces a huge challenge with regards to observational equality. It also has an interesting interaction with otherwise non-deterministic reduction semantics (e.g. in term rewrite systems) because a specialization may freely be more deterministic than the generalized pattern.

But there are useful forms of manual specialization that do not tweak the model. In particular, I believe that partial evaluation wielded pervasively and precisely - e.g. via programmer annotation, with warnings when a requested specialization fails - could be a very simple, effective, and practical basis for manual specialization across languages that have sane effects systems.

What about staged

What about staged programming constructs?

What about a user interface

What about a user interface where a string gets a text box, and an integer number a slider. Is this not specialising the UI for the type? Could you not consider this an automatic optimisation of the input field by a UI library, implemented by a domain specialist? The purpose of the function is the same (request input). The name of a function normally describes its purpose, therefore dispatch on a function over types is all specialisation, unless random functionality is collected together under the same name, which is surely bad organisation?

For the other point, most (all?) compiler optimisations can be represented as pattern matching on the abstract-syntax-tree and then transforming a subtree as a tree transformation. If the input language allowed directly expressing these transformations, then the compiler would consist of a parser and a code generator with tree pattern matching and transformation in the middle. All the optimisations could be read in from libraries written in the source language, with the condition that the source language can represent tree pattern matching and transformation directly, and that the compiler provided data types corresponding to the internal abstract representation (or multiple representations if several stages are used). User supplied optimisations could also be applied.

Text box vs. slider based on

Text box vs. slider based on string vs. number is a clear case of type-driven dispatch. You are not 'specializing' a more general function.

Dispatch on a function over types is not specialization. A compiler might specialize a dispatch function - e.g. by inlining the function and removing irrelevant cases in context. But doing so isn't necessary. A compiler may freely use dynamic dispatch even for static types, i.e. by adding an extra argument.

User supplied optimisations could also be applied.

Indeed. We could have a more extensible compiler, and we could have language features or conventions to identify some functions as optimizers or suggest where the optimizations should be applied. This is a strategy I favor, even if only to simplify the process of proposing and testing new optimizations.

Did you mean "optimisation does not belong in the compiler" in the same sense as "plugins do not belong in the browser" and "PCI cards do not belong in my computer tower"?

Static vs Dynamic Dispatch

I see where the difference in perspective is coming from. In a language with no dynamic dispatch (on types), then all overloading is resolved at compile time and therefore it is always specialisation, and hence the distinction between dispatch and specialisation disappears.

Whilst extending the compiler is useful, I was thinking that all optimisation could be removed from the core, thus simplifying the compiler itself and pushing all the optimisations into libraries, where they can be more easily studied and modified.

all optimisation could be removed

all optimisation could be removed from the core, thus simplifying the compiler itself and pushing all the optimisations into libraries, where they can be more easily studied and modified

I'm all for modeling optimizers as a library of distinct software components that can be parameterized, pipelined, studied, and modified. I also like bootstrapping, such that the whole compiler is a library - perhaps able to support runtime compilation and plugins.

But it isn't clear to me how you imagine separating such from 'the compiler'.

A typical compilation pipeline is `Haskell ⇒ GHC Core ⇒ LLVM ⇒ x86.64`. It may be tweaked a bit for portability reasons, e.g. compiling Core to C, or LLVM to LLJS. I think it isn't very sensible to have LLVM optimizers scattered through my Haskell code; that's the wrong level of abstraction. Also, if we have a pipeline of optimizers at a given level (e.g. for LLVM) then they are generally not commutative, i.e. `optA ∘ optB` is not the same as `optB ∘ optA`.

So we work with multiple layers, different layers for portability, and different ordered configurations of optimizers within each layer, and we'll tend to separate optimizations at least for lower layers. Managing all these configurations becomes a hassle, and after a bit we'll tend to standardize them - e.g. into optimization "levels" and options for cross compilation. These configurations then - in practice, regardless of whether they are maintained in libraries - become the 'core' compiler, with all the attendant complexity and centrality.

Ultimately, layering and ordering prevent us from separating optimizations from the compiler. However, we can still improve our ability to study, modify, and experiment with new optimizations.

Separating compilation from optimisation

High performance libraries often include platform specific optimisations, and I think it would be much better to express the algorithm in a high-level language and then provide platform specific transforms, rather than using a pre-processor to select platform specific assembly language implementations as is currently done.

I could imagine starting with something like typed assembly language, and using the type system to control register and memory allocation (as in effects and resources). The type system could operate like a proof-assistant, making sure that the optimising transforms did not change the meaning of the code. The high level program and the low level would be expressed in the same abstract syntax, so transforming the source program into the destination would consist of solving a logic problem in the type system from the available facts (machine instructions and optimisation transforms). The compiler would consist of a language specific parser, a machine specific code generator, a type inferencer for the internal abstract syntax, a type checker to make sure transformed code is still valid, and a logic solver to find implementations using an external library of transformations. This paper discusses the declarative method of writing a compiler: http://comjnl.oxfordjournals.org/content/34/1/64.full.pdf It would seem the major problem is performance, but maybe a specifically written type system would perform better than a generic logic solver like Prolog? Maybe someone has already published some research into this? I would be interested to know of anything out there.

Bedrock

Sounds a lot like Bedrock: layers of (relatively) higher-level layers built, inside a proof assistant, on top of assembly language.

Bedrock

Based on your description of a declarative assembly based on tactical theorem proving, I think you might be interested in Adam Chlipala's work on Bedrock.

I agree that we should avoid a separate pre-processor language. But that isn't very difficult. A concept of statically computed values and static choice would do the job.

The typed concatenative bytecode I'm developing, called ABC, doesn't have concept of registers. However, I was inspired from tactics-based theorem proving in Bedrock and Coq to pursue a similar feature. So I've created an 'ambiguous' form, called AMBC, that extends ABC with characters '(', '|', and ')'. Code of the form a(bc|d)e(f|g|h) would have six possible expansions: abcef, abceg, abceh, adef, adeg, and adeh. However, ambiguity can be partially resolved in context, i.e. by eliminating expansions that don't make sense in the type system. Further, there are mechanisms to annotate these choices so we can heuristically choose from among the sensible choices.

But my motivation isn't performance. It is more about rapid prototyping, robust adaptation, self-healing code. With ambiguity, we can more directly express pseudo-code and refine it. Our code is more likely to bend than break as we shift context around it.

Text box vs. slider based on

Text box vs. slider based on string vs. number is a clear case of type-driven dispatch. You are not 'specializing' a more general function.

I don't think the line is clearly delineated so far, so let me try: specialization is some sort partial evaluation, often driven by types (but in a type-agnostic way).

Dispatch is a runtime type-indexed operation.

You can dispatch to partially evaluated code, and you can partially evaluate dispatch code, but given these definitions, there's a pretty clear line between the two.

If your type-specific code does not perform partial evaluation, it's dispatch.

If your type-specific code performs partial evaluation, but is agnostic over the specific type, it's specialization.

And of course, you can have both whereby the specialized implementation is selected at runtime via dispatch.

Static dispatch

Consider that I'm replying both to Keean

In a language with no dynamic dispatch (on types), then all overloading is resolved at compile time and therefore it is always specialisation, and hence the distinction between dispatch and specialisation disappears.

and naasking

Dispatch is a runtime type-indexed operation.

Dispatch can also happen at compile-time rather than runtime. Consider the semantics of Haskell type-classes.

If a generic function uses an overloaded operator (a member of a type class), there is not enough local typing information to resolve the overloading, and the generic function is abstracted over a "the here-unknown type parameter must support this operator" constraint.

When the generic function (or the overloading operator directly) is used in a context where types are precise enough to resolve the overloading, the code corresponding to the instance at that type is injected in place of the operator.

This means that "dispatch" in this case happens statically, as a case analysis on the set of possible instance declarations that may match the type at the resolution site -- which can be distant from the site of use of the overloaded operator.

For example if you define add a b = a + b, the variables will be given an unknown polymorphic type a, and the use of the overloaded operator (+) of the class Num will add a constraint in the inferred return type, Num a => a -> a -> a. At this point, we are still using (relativized) parametric polymorphism.

If you later use add (1 :: Int) 2, a form of "dispatch" will solve the Num a constraint by generating code for the Num Int instance. This is where type-classes get their ad-hoc semantics. It may then specialize the code by inlining add, unboxing stuff, etc.

This capacity of abstraction over ad-hoc constraints to delay resolution is a decisive property of Haskell type-classes that give a dynamic "feel" to this inherently static process of type-driven code inference. I think it is also The Right Way to do type-based overloading -- while details of the precise specification of the type-classes mechanism can be discussed, and need not be tied to the compromises made in Haskell.

Type Class Dispatch and Partial-Evaluation are Equivalent

I agree with this overview, there is an equivalence between partial -evaluation and dispatch on type where type classes are concerned. "add a b = a + b" is a generic function, and compared to "addInt" and "addFloat" it is generalised. There are two specialised versions of '+', and the compiler chooses the appropriate specialised version of '+' based on the types at the callsite of 'add'. You could view this as partial evaluation of 'add' by optimising out the implicit dictionary argument of 'add'. By the previous argument we can refer to partial evaluation as specialisation of 'add'?

Nope; dispatch happens before specialization

It is not correct that `addInt` is a partial evaluation of the generic `add`. If you look under the hood, after type-classes are resolved, `add` takes three value-level parameters, the first being the dictionary for the `Num` class of the type variable `a`.

When you apply `add` to a specific-type argument as in `add (1 :: Int) 2`, the code is elaborated into something like add IntNumInstance 1 2, where `IntNumInstance` is the dictionary for the instance `Num Int`, represented as a value in this intermediate language. Filling the blank in `add ? (1 :: Int) 2` is when the dispatch happens. Specialization only comes later: inlining transforms this in `IntNumInstance.(+) 1 2`, and this record access beta-reduces to a constant call to a known function (or in practice inlines and reduce further). This is where partial evaluation happen, and it is *after* dispatch.

Okay, that makes sense.

Okay, that makes sense. Where does that leave my original comment? In the above example the dispatch is trivial as there is only one generic definition of 'add'. That means the choice between 'addInt' and 'addFloat' happens during specialisation hence you could call them 'specialised implementations' of 'add'?

When does static become dynamic?

You could interpret an OO language in much the same way. If we have a function like this:

foo :: Foo -> Bar
foo a = a.foo()

Then this gets "resolved" to parametric polymorphism by the compiler to this:

foo :: (exits a. (a,FooVTable a)) -> Bar
foo (a,vtable) = vtable#foo a

Where # is record lookup.

[[Compare to how type classes get resolved:

add :: Num a => a -> a -> a
add a b = a + b

becomes

add :: NumVtable a -> a -> a -> a
add vtable a b = vtable#(+) a b

]]

With OOP the actual dispatch takes place statically at object construction time. When the compiler sees

class Baz : Foo { ... }

new Baz(...)

it is replaced by:

(BazConstructor(...), BazVTable)

Therefore I think the critical difference is not where the type dispatch happens, i.e. where the vtable gets inserted into the program, but how the vtables flow through the program. OOP vtables are attached to values, whereas type class vtables are attached to variable scopes. This is much like how in a dynamically typed language types are attached to values, and in a statically typed language types are attached to variable scopes.

Static or dynamic

You could interpret an OO language in much the same way

Yes, I find your explanation rather convincing. I do not attach high importance to the question of whether this kind of dispatch should be called "dynamic" or "static".

I would say this (either OOP or type-class) is definitely distinct from "really dynamic dispatch" such as if (foo.IsInstance(Bar)) ... else if (foo.IsInstance(Baz)) ... else ... -- even if those could in some cases get statically evaluated by partial evaluation (inlining, etc.).

Therefore I think the critical difference is not where the type dispatch happens, i.e. where the vtable gets inserted into the program, but how the vtables flow through the program. OOP vtables are attached to values, whereas type class vtables are attached to variable scopes.

I find this last characterization too vague and possibly wrong (type class dictionaries are taken and passed as implicit parameters of functions throughout the actual call chain, which means their value flow is not lexical). But I'm not sure, when you say "the criticial difference", of what you are trying to separate. OOP and type-classes? Static and dynamic dispatch? I think the lines are blurred and don't insist on separating them; I was rather trying to argue for the difference between ad-hoc and parametric, or dispatch and generics.
My point about dispatch sometimes being static was more than dispatch can happen in places we maybe don't expect.

Besides, I think in practice it is non-innocuous that type classes are resolved at compile time, as they allow forms of type-level computations that is heavily used (abused?) in the Haskell or Coq community.

Dispatch can also happen at

Dispatch can also happen at compile-time rather than runtime. Consider the semantics of Haskell type-classes.

As Jules described, I don't consider this "static dispatch". The dispatch is still happening at runtime by indexing a record with a selector, so the jump target is dynamic. It would become a static dispatch only after specialization partial evaluates and entirely eliminates the dictionary indexing, and thus the jump target is static.

If there's more than one possible target in a whole program, this specialization just pushes the dispatch further back.

What if Jim suspects his

What if Jim suspects his code is safe that way, but discovers he was wrong the hard way by runtime failure. What does Jim do next?

He wouldn't, since he would already get a type error at compile time like "Type T doesn't support method foo". Then he would either change T to a type that supports foo or don't call foo.

I'll try to explain parametric polymorphism here. Suppose you want to write a function that returns its argument. We can do this:

int foo(int x){ return x; }

But this only works for integers. We can write another one:

float fooFloat(float x){ return x; }

However this gets tedious. The general pattern is:

T foo(T x){ return x; }

That's what parametric polymorphism is: the language lets you write T that stands for any type. In some cases the syntax is a little different, for example:

T foo<T>(T x){ return x; }

The <T> is to indicate that T is not a concrete type. Or in Haskell you can simply write:

foo x = x

And the types are inferred.

That's all there's to it. Of course this example is useless since the function doesn't do anything. In practical cases you have a partially concrete partially generic type:

List<T> reverse<T>(List<T> lst){ ... compute the reversed list ... }

Here too we could have written reverseIntList, reverseFloatList, etc., but like before that gets tedious.

An example of what I meant:

Thanks for a good, detailed description. I'm familiar with generics from C++ templates. When I returned to visit my old boss at Taligent around 1994, he showed me the new hotness that was STL, because I was that odd guy who likes language stuff. I had to learn enough ML around 92 or 93 to read Appel's Compiling with Continuations, and afterward I studied a text on general functional programming, so I saw FP techniques in use within STL and blurted, "This looks like functional programming." He confirmed it was compile-time Turing-complete when I asked. (I neither detest nor especially like templates, and typically avoid them when it doesn't cost me.)

I feel this thread is highjacked by generics, when I just wanted to advocate simplicity in Rust and other languages, so I expand further only under mild protest. :-) I'll try to explain the scenario with Jim I meant, to which you replied:

Then he would either change T to a type that supports foo or don't call foo.

A concrete example would be better, but instead I'll lazily describe it as a general pattern. Suppose Jim uses a group of template methods collectively implementing an algorithm, working with types X and Y, and appearing to apply to Jim's type Z too. Operators f, g, and h are used, all supported by types X and Y, which are used to implement new operator * in the templates, which must be cummutative for the template algorithm to work. Jim's type Z has operators f, g, and h, but when combined in just the way used, the resulting * is not cummutative. So it compiles without error, but behaves strangely when run, so Jim sees evidence disagreeing with what he expected. (Substitute associative for commutative everywhere above, if that makes a better example.)

That is indeed a failure of

That is indeed a failure of C++ templates. Parametric polymorphism does not share that problem. To call any operators on the generic argument you need something additional in your language. In OO languages that is bounded quantification and in Haskell that is type classes. You would add a constraint to the type parameter to make sure that you have the right f,g,h and not just any operators that happen to be named "f","g","h" (the constraint would be a subtype constraint in OO languages and a type class constraint in Haskell). C++ on the other hand does no such checking and the compiler is happy if it can find any old operator as long as it has the right name. Concepts are supposed to be going to address this.

Sometimes it is...

The attached video discusses Self, which really does demonstrate very well that sometimes less is more. Self Video

Mischaracterization of Rust

Sure, Rust is complex. But I think you mischaracterize the reason for the complexity. Why did you come to the conclusion that it was the result of smart people showing off?

Rust is complex because it is tackling difficult problems that other languages do not tackle. They're trying to make a language with efficiency comparable to C++ but with stronger compile-time guarantees.

There are two ways Rust could be screwing up, in theory. First off, maybe nobody actually needs the kind of efficiency they're aiming for? I'm not using Rust because I currently don't have a need for that kind of efficiency; I'm happy to take performance hit and use a less complex language. However, I do believe that there is an important subset of software that does need C++/Rust-level efficiency.

Second, maybe the additional safety guarantees aren't worth the additional complexity? Maybe people would rather use a simpler language with fewer safety guarantees rather than fight the Rust type system? Or maybe someone will find a way to create a simpler language that gets you the same benefits? This could very well be true, but I haven't seen a convincing argument that this is the case.

Go doesn't have the same safety or performance goals. And even without those I think it lacks well-known generalizations that could have made it a better language: parametric polymorphism (special case: built-in collections), algebraic types (special case: null), tuples (special case: function arguments and return values).

I also skim the Rust mailing list from time to time and get the impression that they care a lot about simplicity and understandability. For example, there was talk a few months ago about removing the concept of a garbage-collected pointer from the language to make things simpler (maybe this work is complete?). I also remember them converting a particular static check to a dynamic check because of the complexity of doing precise-enough static checking. These people seem to care a lot about simplicity and understandability.

I guess my point is that there's no reason for me to believe that the Rust developers are making things unnecessarily complex. Do you have any examples? Or are you just saying that you think their chosen goal is incorrect?

You got me; I'll go along quietly.

I didn't mean to complain about Rust specifically. That would belong on a Rust mailing list, not here. I didn't expand enough on a sentence about simplicity in other languages. When anyone writes about their new language and says it is so simple and here's why, I'm offering to give feedback on making it simpler still—maybe just by editing descriptive prose, who knows.

Why did you come to the conclusion that it was the result of smart people showing off?

I didn't mean Rust got complex just so folks could show off. I mean when folks discuss tech topics (which includes Rust discussion), there's a bias towards complexity that I attribute to two causes: 1) writers showing off, and 2) readers being swayed by complex writing, which rewards writers for doing it. (Past research has shown writing is often graded as having better quality when it favors more math or merely longer words and more complex expression, along the lines of "fancy equals good".) Unfortunately, neither of those encourage simpler software, because there's complexity in the story now that needs stripping away. Fortunately, everything I see pcwalton write about Rust is very good, and I expect him to single-handedly make a big difference, which is encouraging.

I guess more folks might reply asking, "Why are you criticizing Rust?" Fair warning: I might not reply to any of these. I would like to see Rust shape up into a good tool used a lot, so I'm specifically suggesting "more simple please" as if the soup will be fine if we just add less salt. There's something to say for succeeding just by avoiding fatal mistakes. On a list of things to fear, I want complexity to appear when folks design languages. Anything good enough to attract discussion is going to draw complexity-amplifying discussion. I want you to think of it as entropy that needs scrubbing, that's all.

Or are you just saying that you think their chosen goal is incorrect?

The chosen goal sounds good to me. Also, I have no objection to complex implementation. I'm encouraging future design taste to skew more toward simple, and I'm stooping to fear-mongering to motivate this as a priority. And more than that, I would like to see language designs aim for simple specs and stories about coding and debugging. There will still be plenty of complexity to go around. A common developer bias seems to discount further complexity up ahead, so budgeting doesn't allow for enough.

     "I see what you mean," Crash chimed in mock agreement. "Forth seems too complex to me. What can we do to make it simpler?"
     "Oh, shut up," Wil grinned. "Okay, some languages are simple enough."

i'll 'see' your 1) and 2) and raise ya

actually i mean dang, i must be the PERFECT counter balance. i hate complexity. i can't ever grok it. :-)

Simple languages lead to complex programs

Every now and then there is this plea for simple languages.

When of them gets adopted, what happens is that after a few language revisions, they become as complex as the ones they were trying to fight against.

The world is complex and I rather have powerful languages that allow me to extend them via libraries with first class treatment, than plain simple ones.

Simple languages are an utopia, because in the end, every single developer needs to replicate on his/her code what abstractions other languages offer for free on their runtimes.

Simple vs. Powerless

I think you're confusing the two! Simple doesn't mean 'rule of least power'.

What simplicity means (to me) is orthogonality and consistency - that there are no 'corner cases' that complicate reasoning, nor any overlap in utility. Naturally, this leads to a certain form of language minimalism: we can't have orthogonality if there are redundant designs in the language. We can maintain simplicity as we build higher level libraries, i.e. by implementing higher level abstractions that fractally have the same attributes of orthogonality and consistency.

Simplicity is orthogonal to power. Simple languages can range the gamut from very powerful (e.g. Scheme, FORTH) to relatively powerless (a diagrams model, or context-free grammars, or an atemporal constraint system). I don't believe the OP is advocating against powerful languages with big libraries.

Bare naked languages without libraries lead to complex programs.

I agree with dmbarbour that simple and powerful can go together. Your title should have been "weak languages lead to complex programs", but that's still undermined by libraries handling complexity.

Yes, simple tools require handling complexity elsewhere. The world is complex, and languages can be powerful while still simple, through elegant basic structure with complexity handled in libraries. Language and library are different things, though. I'd agree a language you can't extend with a library doesn't sound powerful. Do you have one in mind?

Anything can be ruined via sloppiness. A "simple" language might defer a complex issue to a library, then depend on it so the library is required, and make interface and dependency so complex it must be that particular library so it can't be replaced. In such a case it's not a library — it's a part of the language with rotten organization, and it's a complex language. Is that what you mean?

I think we're using different definitions of language. You might mean when you download and install some hypothetical language L, it should come with every library you ever need in one package, so there's nothing for you to do as a developer except instantiate your app and tell it to run. I'd call that a system rather than a language. How is that relevant to a discussion of a simple base language to which you then add libraries?

abstraction theory

Some of the terminology of abstraction theory might come in handy here.

Each extensible language is surrounded by an envelope of possible extensions reachable by modest amounts of labor by unsophisticated users. — [Standish 1975]

A key notion in my study of this stuff is abstractive radius, by which I mean essentially the breadth of the envelope Standish described. Smoothness is the property that the abstractive facilities apply in a free and uniform way; in a smooth language, you can't actually tell whether a facility is primitive or derived. The Smoothness Conjecture in my disertation says that any roughness in a language ultimately bounds its abstractive power.

If you can't tell a primitive from a derived feature, then applying a bunch of libraries to the base language results in something that is just as much a "language" as the base language was. However, unless the base language had an infinite radius of abstraction, it's to be expected that after applying all those libraries one's language has a shorter radius of abstraction than the original language had. Understanding just what the factors are that cause the post-library language to have a smaller radius, and how to mitigate the effect, falls within my long-term pursuit of abstraction theory. (I really need to write a decent blog post on this stuff.)

abstractive radius & smoothness

I like this terminology; it fits some concepts to which I've never given a name. A good blog post simply describing terminology for abstraction theory would be nice.

I would guess that Standish's 'envelope' has at least one more layer to it: extensions reachable by careful design of visionary architects. You know... the people who write Linq or Rx extensions, lens and pipes libraries, Angular or Node. Such extensions seem to open new volumes accessible with "modest amounts of labor by unsophisticated users".

Smoothness is an interesting property.

It seems to me that a language with sophisticated syntax will generally be 'rough' because use of first order syntactic expressions tend to be visibly different from use of derived abstractions. Ability to manipulate syntax might regain smoothness, but comes with much complication for every tool that processes the language. A language with relatively minimal syntax - e.g. Forth or Lisp - has potential to be very 'smooth' without becoming complicated.

Of course, most languages - even Lisp and Forth - tend to falter with regards to concurrency, distribution, and heterogeneous computing - e.g. shaders and GPU computing. And frameworks often don't fit the usage patterns of common functions, and tend to fail with respect to orthogonality. So these are still sources of roughness. I'd suggest the most important problem we can address for simplicity and smoothness is the ability to abstract and compose frameworks just as we abstract and compose functions.

Agreed

John's remark on abstractive power are always a bit frustrating. They sound very interesting, but on the other hand it's always a bit too short and abstract (no pun intended) to really "get it". Or rather, I think I understand what he means, but there is not enough to decide between "interesting but ultimately useless idea" and "I want to think harder about this". Maybe it's missing a bit of demonstrated use: what kind of problems can we solve with this new perspective? Did we gain the power to explain/justify things we couldn't grasp properly before?

The terminology itself

The terminology itself allows discussion of concepts that are otherwise hard to put a name to. That's worthwhile in itself.

I find it somewhat frustrating too. Understand, I don't limit myself to studying problems I think myself capable of solving in a "reasonable" amount of time. I've been on my abstraction-theory quest for about a quarter century now; if I were on a deadline I'd have long since abandoned it. Other than the terminology, the most nearly substantive thing I've got so far is a preliminary sort of techreport from 2008, twenty years in. The thing is, I'm still excited about abstraction theory. Why? Not, evidently, because of the spectacular results I've achieved. What excites me is the potential. McCusker started this thread advocating the benefits of simplicity; what if we had formal results we could point to for such a position? It's one thing to say, doing things this way is all peaches and cream, whilst the alternative is icky; quite another to say, doing things this way has this desirable formal property that the alternative lacks. One could take an intuitive justification that only appeals to a certain kind of mind, and expand its appeal to a wider audience who didn't share the purely subjective intuition.

smoothness etc.

Hm, yes, it makes sense complicated syntax would tend to be 'rough'.

I would guess that Standish's 'envelope' has at least one more layer to it

An excellent point, very helpful for what I'm developing on the formal side. I've been frustrated that my treatment was far less simple than I'd like due to the differerent classes of expressive transformations that could be considered — essentially, different classes of difficulty of rewriting programs of one language to programs of another. One gets different judgements about expressivness and abstractiveness depending on these degrees of difficulty. I still hope that with enough experience of formal results it will become apparent how to simplify things somewhat, but your observation offers some intuition as to where the multiplicity of transformation classes is coming from.

I'd suggest the most important problem we can address for simplicity and smoothness is the ability to abstract and compose frameworks just as we abstract and compose functions.

Yes. Of course one can get all tangled up in the various meanings of "abstract" (and indeed we've done quite well at getting tangled in them), but at bottom composability is surely related to smoothness and 'abstractive radius', and I suspect a deep connection.

For me examples of simple

For me examples of simple languages are C (discussable given the amount of undefined behaviours), Oberon(-07), Go, among many others.

As examples for complex languages offering enough expressive power, I would list C++, D, Go, Haskell, OCaml, Ada, among many others.

One think to keep in mind

One think to keep in mind with such comparisons is the scope of the features that are compared. C is a simple, compact language. Ada is big. That's fair as far as it goes. But, if you want a meaningful comparison, you should include the conventions about organizing header files and the use of make, on the C side, since these things you get for free in Ada and more importantly they are inescapable, so you can't "subtract" them from the Ada side of the comparison. Any reasonably sized system in C will make use of such things, even if they are not part of the lovely (K&R) definition of the language.

To put it more generally,

To put it more generally, the comparison is not simply between language A and language B + libraries. It is really with B+conventions+libraries+tools. When designing you can play with each of these levels. Personally, I like the semantic guarantees languages give which the other levels typically don't. But that's just me.

One could always consider

One could always consider the library and IDE as part of the language. Early smalltalk systems really went in that direction, to the point that there was obviously some push back. Java's main innovation was a Smalltalk-like rich standard library that was essentially a part of the language.

I'm annoyed by PL teaching approaches that teach programming languages through interpreters alone: semantics are only a small part of the overall programmer experience, which should be examined more holistically.

PL teaching

My understanding is that courses that use interpreters as running examples are not meant to teach people how to program (they should know already) but rather the "science of programming languages". We now understand very well how to formalize the semantics of programming languages. Formalized they become a pure, mathematical idea (that is of practical interest) that we can explain to students.

On the other hand, I have no idea what would be a "scientific approach of libraries and tooling". I hear or say so often that we don't have enough research "about tooling" that it's becoming cliché; but yeah, we don't really know how to do that. I think this is related to your remark: as long as I won't understand what is scientific about libraries and tooling, I probably won't be comfortable "teaching them" in advanced classes.

(This leaves me wondering about how a course around the "Sociology of PL adoption" work would go. Make students statistically analyze more sources of language adoption data? Ask them to run around CS companies, asking questions about local programming language choice?)

After a good course about formal semantics of programming languages, I would expect people to be able to take a language definition and deconstruct it in term of features they understand, and maybe (harder) have an idea of how to formalize new features they would discover in the process. Those are useful skills if you want to move easily from one language to another, or design programming languages yourself (it gives you a framework to express and discuss design choices, and the vocabulary to read research articles that may be relevant).

What would I expect people to retain from a "libraries and tooling" course? It would be nice if there was a way to look at libraries that would make getting used to a new big framework easier. I would also expect them to be good at designing new APIs, or maintain and improve existing ones. I think I see where this may go. Regarding tooling, I have no idea. Is it about teaching static program analysis?

I understand what PL courses

I understand what PL courses are trying to teach: the "design of programming languages," but focusing on just semantics (because its a science!) is quite silly; the utility of many languages go way beyond those semantics.

Is programming language design a design discipline or a scientific discipline? Or is it reasonable both where both topics can be taught separately? Or should we continue to be irrelevant by letting PL designers design languages that actually get used while scientist types just criticize from our balcony seats like Statler and Waldorf about details that are slightly important but not anywhere close to the big picture?

Those are useful skills if you want to move easily from one language to another, or design programming languages yourself (it gives you a framework to express and discuss design choices, and the vocabulary to read research articles that may be relevant).

Yep, those scientist types use the only vocabulary they've been given, is that useful?

What would I expect people to retain from a "libraries and tooling" course?

My PL design course at UW did it right: they went through a bunch of languages in the context they were meant to be used! It doesn't make sense studying the semantics of Smalltalk: you need to study what makes Smalltalk so cool which happens to include its entire ecosystem. Many PL researchers just don't get this.

Regarding tooling, I have no idea. Is it about teaching static program analysis?

We should teach the design of programmer experience, as a design discipline. That involves language, tooling (which is really a part of the language), and libraries (which is really a part of the language). I only avoid saying programming language design these days because many people don't think that involves ecosystem or context, that somehow all those AST elements exist in a vacuum.

Sorry, I'm feeling particularly combative today. The real person I should have this argument with (the guy who wrote the book), would probably rip me to shreds :)

Yes, but what/how?

(I had no problem with your tone at all. I enjoy those sunday LtU conversations.)

I understand that was probably for rhetoric purposes, but I think your Ivory Tower presentation of PL research influence over actual PL use is overly grim. My feeling is that there are quite a lot of interactions now, with for example Rust and Ceylon being thoroughly informed of academics positions on language design (I'm not even citing Haskell/Scala/F# that are more like academics language eventually acquiring mainstream credibility).

We should teach the design of programmer experience, as a design discipline. That involves language, tooling (which is really a part of the language), and libraries (which is really a part of the language).

I definitely agree on the principles, but it's more wishful thinking than an actual suggestion. What and how should we teach? If you were told tomorrow by an university dean that your ideas look great, and she's got a course on this topic reserved for you starting the week after, what would you do?

One idea would be to collect a list of "cool things that were forgotten by the mainstream", present their history and demo them. Smalltalk live images, I-don't-know-which terribly innovative version control system of the nineties, restartable exceptions, f-expressions, what else? I'm not sure this idea alone would make a good course.

Reflective towers?

Reflective towers?

My feeling is that there

My feeling is that there are quite a lot of interactions now, with for example Rust and Ceylon being thoroughly informed of academics positions on language design (I'm not even citing Haskell/Scala/F# that are more like academics language eventually acquiring mainstream credibility).

I'm not claiming we are useless, just that our toolbox is too limited without the support of an actual language designer; i.e. we can't stand as language designers on our own with only conventional training on semantics. And right now, the other 80% of our job is mostly a dark art that we have to learn through trial and error. The PL theory course then suffers from some false advertising.

Scala is a great example: its fight for acceptance suffered for a long time because its tooling aspect wasn't considered as being subordinate to, rather than a peer of, language semantics.

What and how should we teach? If you were told tomorrow by an university dean that your ideas look great, and she's got a course on this topic reserved for you starting the week after, what would you do?

Survey, critique, and deconstruct. Basically, courses should merely bootstrap the experience process and provide some clarity the topics are less dark arts.

One idea would be to collect a list of "cool things that were forgotten by the mainstream", present their history and demo them. Smalltalk live images, I-don't-know-which terribly innovative version control system of the nineties, restartable exceptions, f-expressions, what else? I'm not sure this idea alone would make a good course.

It worked for me, at least.

Are programming languages a

Are programming languages a branch of math? I think only a small part of it can be formulated as mathematics. Sure, math is awfully convenient to teach, but if you limit a course to that you lose many important topics. A programming language is a tool for humans. Although the mathematical aspects give us important methods, just like you can't math your way to designing a house, you can't math your way to designing a programming language. Instead of just teaching the semantics of a language & mathematical models of the compiler/interpreter/type checker, you can teach about the different ways a human can actually interact with them.

The traditional model is that you write several files of code, you compile them & get type errors, link the object files together, and execute. The Lisp model is completely different. You have a long running program that you submit code to piece by piece, which that program then executes. Instead of viewing a file as a monolithic unit that gets processed as a whole, in Lisp each definition is a command for the Lisp implementation that puts the contents of that definition in its memory and available for subsequent commands to use. In Smalltalk/Self the model is different: instead of sending individual commands for definitions, you essentially edit the definitions in memory directly in a more structured way rather than funneling them through as a list of commands. Then you have modern IDEs for statically typed languages, where you interact with the type system in a different way with on the fly type feedback. In Agdamode you interact in yet another way through holes. In Subtext it is again different.

Similarly, you can teach about different kinds of debuggers (traditional stepwise, Lisp & Smalltalk style where the debugger is noting more than the exception facilities in the language, execution recording or time traveling debugging, whyline, Sean's recent work, the work on statistical debugging & debugging based on version control history, contracts as in Racket, quickcheck, how type systems fit in, etc).

I imagine that an architecture student gets courses about material science & structural engineering, but I imagine they also have topics that focus on the fact that the point of a house is that a human being will be living inside it. It may be fuzzier and harder to teach, but that doesn't make it less important. Instead of giving up on teaching it completely, it's probably better to accept the fact that it's not as neat as the math.

Very true, and the extra

Very true, and the extra complexity of Ada's more principled modules makes it simpler to reason about and more robust to refactoring when compared to the "simpler" C. For instance, changing import order of Ada modules preserves program meaning, which is obviously not the case with C. The "more complicated" module semantics makes understanding and changing programs much simpler.

The meaning of "simple" needs to be clarified. I think "simple" really boils down to "provides clear, strong invariants with few (preferbably no) exceptions", as this is what programmers need to interpret the meaning of programs. The more invariants, the less context you need to understand code.

Running with scissors.

Attributes associated with "simple" might include lucid, clean, strong, short/spartan, consistent, and smooth (in John Shutt's sense). Given an initial kernel powerful enough, you can spin endless complexity out of this in application (via libraries), which might also have similar attributes of crispness and compactness when new concepts are easily added by what amounts to vocabulary. Suddenly that sounds banal to me as an idea resembling how natural language works.

Ehud Lamm: Any reasonably sized system in C will make use of such things, even if they are not part of the lovely (K&R) definition of the language.
Sean McDirmid: ... That involves language, tooling (which is really a part of the language), and libraries (which is really a part of the language).
naasking: The meaning of "simple" needs to be clarified.

For precision and clarity, it's useful to have separate words for different parts of our tools and phases of our works using them. We seem to use "language" in the PL sense for everything, in all phases, from starting kernel to eventual ecosystem. That seems odd given our penchant to use many words to describes parts of various designs.

     "I know," Crash raised a hand. "We take the first letter of a language — easy in C's case — and append K for kernel or E for ecosystem. So CK is C's kernel and CE is the C ecosystem. Or you can think of K and E as cross-cutting ideas across all languages where organizational patterns remain."
     "That gives us AK and AE for Ada's kernel and ecosystem, and JK and JE for Java's kernel and ecosystem?" Wil asked. "That's almost too easy."
     "Won't Python and Perl folks kill each other as they fight over the letter P?" Dex objected.
     "That's not my problem," Crash evaded with a faint smile.
     "Perl's not long for this world anyway," Ivy offered, then made the wuh-tchish sound-effect of a whip lash. "Sorry, bad joke."
     "I think Crash's point is you can abbreviate any time you want," Wil summarized. "There's nothing stopping us."

Anyway, there's a total-cost-of-ownership concept applicable to a programming language due to ecosystem and whatever collection of tools you have in hand from vendors or community, or your own personal ingenuity. One might argue a language is not really simple if the process of solving any real problem obligates you to bring total complexity up to par with other languages when the full suite of build and debug tools get involved. Cost of full life-cycle is different from initial cost of getting started. Clearly a language's kernel must have an effect on it's ecosystem of add-on tools, so we can judge a kernel in part by extra costs foisted on the ecosystem at large.

A problem of confusing ecosystem and kernel, accidentally or purposefully, is how this denies we can roll back to the start with just a kernel alone, either to start again from scratch or merely analyze problems in hypothetical terms with less hair. If I write a dialog between two characters arguing about a problem, using first principle concepts, the complexity is related to the kernel's size, gnarly shape, and consistency or lack of it. Discussions between two real developers would be similar in ease depending on kernel nature.

     "Let's try an exercise," Wil held up an object to show. "What are these?"
     "Scissors!" Dex snapped with the speed of a game contestant.
     "Right. And what do we call this end?" Wil continued.
     "Handle," Ned drawled. "Does this have a point?"
     "Bear with me," Wil persisted. "What about this long sharp part of the scissors?"
     "Blades," Dex answered, enjoying his new high score lead.
     "Yes, yes, and the joint where it hinges is held by a pin," Ned anticipated. "If your point is a language is like scissors, that's stupid and useless as analogy. What are you getting at?"
     "That we use words to talk about parts of tools," Wil said with a wounded look. "Was I taking too long?"
     Crash rolled his eyes. "Here you had this chance to make an endless series of references to movies with scissors in them, and you blew it."

Axiomatic Language

Axiomatic language is very small. It may be about as small as a Turing-complete formal system can get and still be human readable and expressive. It also has "generality". But it may not be simple for mainstream-language programmers.

Axioms together with rules of inference are insufficient

Axioms together with rules of inference are insufficient.

Please see LtU posting Computation is not subsumed by deduction (contra claim by Kowalski).

Axiomatic language is about specification, not deduction

Axiomatic language is about specification, not deduction.

There's simplicity, then there's intuitiveness

There's simplicity and then there's intuitiveness.

I would have said that the test of giving rules to a fresh newcomer of mean ability and intelligence has the effect of testing those rules for intuitiveness rather than simplicity. Often intuitiveness and simplicity go together, but often enough they don't.

Consider syntax; contrast Lisp syntax with (say) Perl. Or if that's so extreme that it's almost a cheat, contrast it with C. Clearly the Lisp syntax is simpler, yet from my experience, a fresh newcomer of mean ability will often prefer the more complex syntax. At least, I've heard many people with a little exposure to Lisp syntax say as much.

I'm not saying complex syntax is better; I don't believe so at all. I'm saying that IMO the proposed test is a mixed blessing for simplicity.

I wonder to what extent

I wonder to what extent there is such a thing as a fresh newcomer. Or is it, in practice, always a matter of what they're used to?

On a another note, from my adventures in conlanging, I have a suspicion that what "complex language" means may be different for human beings than it is for computers. Some years ago, I thought natlangs were irregular simply because of the randomness thrown in by their evolution, so that when designing a conlang, the only reason to introduce irregularities would be to give them a naturalistic feel. But then I designed a conlang that was deliberately not bothering with naturalism, and found myself gradually introducing irregularities for various ergonomic/pratical reasons.

Similarly, I wondered what

Similarly, I wondered what this newcomer was used to before; ie, where his tree of familiarity branched off from Rust or whatever language is being considered. To be annoyingly broad, the spectrum could run from "new to western civilization" to "new to this particular Lisp macro on our team project". I know of course that a medium value is wanted, and having read McCusker's answer, he says as much: a professional coder. Still leaves room for ambiguity.

Complex language is definitely a different thing for humans. Computers, for all their weakness in NL processing, can slam-dunk sentences like "The mouse the cat the dog the cow bit scratched barked at is gone", mismatched verbs and all. Meanwhile, humans process function words with a different part of our brain than for open-category words.

But to my mind, the more relevant simplicity is simplicity in problem-space. Eg, a Turing machine is, in some idealized sense, about as simple as you can get, but since it's nowhere near most problem spaces, it doesn't register as simple; it's not useful simplicity.

This leads to the standard argument of "With macros we write our own language for each problem-space".

right sort of simplicity

Sounds like the right sort of simplicity. Though I maintain macros aren't very abstractively powerful.

clarifying term "newcomer"

(Tehom is a very interesting handle when I google it; this is your first post. Your writing style strikes me as familiar, but I'll pretend I don't see it. Feel free to spin your own tale, but less anonymous strikes me as better.) Here, let me go back and expand fresh newcomer slightly after showing a bit of context first:

How do you know when it's simple enough? There's an empirical test. Write down rules, give them to a fresh newcomer of mean ability and intelligence, and see if they can apply and defend them in various cases ...

Now here's what I meant: fresh newcomer [to Rust, from a professional background in coding]. (For Rust, go ahead and insert any other language L, but I'll keep saying Rust to stay concrete.) In short, I don't think a test of simplicity matters at all for new coders learning to program. I think a typical target dev for Rust is a systems programmer who finds syntax detail less confusing than newbies. How things look is not as hard a problem as what they mean, when it comes to getting lost in edge cases. Going astray in semantics via ambiguity, lack of definition, and "do what I mean" features seems a bigger problem than complex syntax.

The sort of arguments I imagined a newcomer having with folks pushing a complex agenda might consist entirely of conversation in plain language without ever resorting to a language's syntax in examples. If you can't make a plain (natural language) description stand up, there's not much hope in defending specific code examples, because rationalizing what you're doing in reasonable terms is how you get along with colleagues. If they say what you're doing is stupid and you have nothing, it's time to cave.

Introduction

Thanks for the welcome. This is indeed my first post on LtU, though I've lurked a bit. I've just been a little lazy about filling in the bio and interests on LtU. I've been here and there using the handle Tehom for about 2 decades; most of those Google hits are me.

looks and meaning

I can't agree with idea that how things look is not as complex as what they mean. Contrarily, if you examine many languages such as C++ or Felix you'll find the underlying semantics are quite simple and the major issues are in fact appearance.

Indeed, the Felix system uses a user defined grammar, so the "language" semantics can eventually be understood by underlying terms. It was, however, an interesting effort to use records of function closures in the method of CLOS to implement something that looks remarkably similar to Java objects and interfaces .. entirely in the grammar.

The problem isn't merely simple semantics, but to find semantics with coherent lexical representations. And the simple fact is that programming languages are a bridge between trivial calculations and human thought, serving a wide variety of application domains. The result, I'm afraid, is condemned to complexity because humans are.

Semantic complexity

the underlying semantics [of C++] are quite simple and the major issues are in fact appearance.

Digging up a language whose semantic complexity matches that of C++ would probably be worth a PhD in esoteric software archaeology.

Ada is up there in language

Ada is up there in language complexity, it just fits together much better overall. I can't think of another language that would qualify.

Long shot

Ada and also PL/I crossed my mind, but they both seem a long shot from C++.

area inside vs circle circumscribing space involved

I had a similar reaction to Skaller's position, which almost reads like a taunt to come and trash the reputation of C++ by saying it's not that complex:

the underlying semantics [of C++] are quite simple

(Scratching my head.) If it's a taunt, I don't want to fall for it, but otherwise I'm interested in your reasoning. You are incidentally confirming my main point, which is that conversation about complex things does not require code examples, and that in fact agreeing on the meaning of simple words is often a challenge. Of course this is due to ambiguity, because simple statements are actually often merely vague, with different folks picking alternate interpretations.

@skaller: If it would not impose on your time unduly, one example of C++ semantics that is simpler than the associated syntax would help clarify your point that C++ semantics are quite simple. Please pick one that's not C syntax, though, since I think we're going with a premise that C++ is more complex than C.

I can give you one example of syntax I think is simpler than the associated C++ meaning. At the end of a virtual function declaration, if you insert "= 0" before the semicolon, the meaning of this can have complex effects hard for a coder to understand. If that function is called in the class's destructor, and a "pure virtual called" exception is thrown, this puzzles some people. The syntax is trivial, but the meaning is subtle in effects that can occur. Note this syntax doesn't bother me; I don't find pure virtuals complex.

In contrast, templates can let you run arbitrary computation at compile time, which is a surprising amount of extra complexity resulting from a small extension of syntax. That bothers me a bit, but I get that templates are awfully convenient, and that ship has already sailed anyway. I haven't looked more than briefly at C++11, but my sense so far is that high level "do what I mean" magic is present in some extensions, rather than leaning toward a "do exactly what I say" design focus more characteristic of C.

The general principle I want to invoke is the following. A language designer thinks up a neat idea, writes a description, then adds one more new language feature. The syntax to fire this new feature is a small change, like a selector picking one item versus another like an A/B semaphore. So a bit of new syntax can unambiguously denote a new option without in any way capturing the complexity of resulting behavior. So I'm very surprised to hear anyone thinks semantics is simpler than syntax, given this way of seeing it. (Edit: darn, I had syntax and semantics backwards in two different places.)

C++ syntax re semantics

In C++ a couple of syntactic marks can have a significant impact. In fact a complete lack of such marks can have significantly different behaviour: even in C not only is the meaning of (f)(x) context dependent (cast or function call?), you can't even parse it without context.

However the underlying semantics of virtual dispatches are actually quite simple, its an indirect jump with a coercion on the this pointer and return type. The function dispatched to can be tricky to determine during construction or destruction .. so don't do it.

When you say "C++ is complex" you have to distinguish "Practical C++" from "ISO C++". By consensus of the committee the ISO standard defines complex rules to cover corner cases in "ISO C++" that are not part of "Practical C++", because the ISO Standard does not define the C++ language, it constrains the behaviour of C++ translators. The compiler vendors wanted an algorithm to implement in most cases, rather than argue about exactly where the borderlines of Practical C++ lay.

In other words, the rules of ISO C++ are actually simple, its just that they're designed for compiler vendors, who do don't care about the semantics, they just need to get the specified algorithm right.

impact is semantics

We're miscommunicating. The following is my point, but you argued syntax is more complex than semantics.

In C++ a couple of syntactic marks can have a significant impact.

Marks (and grammar) are syntax, but impact is semantics. You don't seem to distinguish syntax from semantics.

It was, however, an

It was, however, an interesting effort to use records of function closures in the method of CLOS to implement something that looks remarkably similar to Java objects and interfaces .. entirely in the grammar.

If you add a syntax for object-oriented programming that you compile down to records and lambda-expressions, this compilation pass specifies a part of the semantics of the resulting language (after syntax extension). That this part of the compilation scheme is written inside the production rules of a parser doesn't make it any less semantic.

Complexity

I believe my point was that the *underlying* semantics of a system and the complexity of the syntax are two different things. C++ has extremely complex overloading and namespace lookup rules, for example. Well I find them hard to follow!

On the other hand underneath its just a function call.

If you will accept the underlying semantics of C is just some calculating ability, memory access, conditional jumps, and function calls, then C++ only adds two things: virtual dispatch and exception handling. C++11 adds threading.

An important point here has already been made: Turing machines are simple and can do everything, but that doesn't make them amenable to programming. Adding complex constructs on top of a simple layer bridges the (domain and individual dependent) gap between human and machine comprehension. Designing a good language is an arcane art bridging a cognitive system which is primarily spatial and associative and barely capable of reason, to a mathematical model barely capable of abstraction.

I guess most people here would believe static type systems are a good way to bridge that gap, so we all may be a bit surprised practicing programmers don't agree and prefer Javascript. Why? Because the complexity is *delayed*.

For a bag full of corner cases

You are glossing over a lot of things here. First, the "semantics" of a programming language consists of its dynamic (i.e. run-time) and its static (i.e. compile-time) semantics. So overloading, implicit conversion, templates, specialisation, and what not are part of C++ semantics.

Second, the dynamic semantics of C++ is not simple either. For one, because it is heavily intertwined with the static semantics (for C++, you can not define what a program does without reference to basically all the static semantics, which notably is the case for some other languages). But also, because even the core execution model is horribly complicated (just consider the memory model, sequence points, etc. -- let alone all the effective non-determinism through undefinedness).

Musing about "practical complexity" that can ignore "corner cases" doesn't help either. These corner cases affect the meaning of programs, and thus of the language. And C++ is full of them. Programmers ultimately have to understand them, because any non-trivial program will inevitably run into them and not work as expected if there are misunderstandings. They are integral part of the definition of the language, and thus of the contract between programmer and compiler.

Corner cases

I have a famous book by Reynolds on semantics, which is quite explicit in saying it is concerned with the algebra generated from the concrete syntax, and is not really concerned with the concrete syntax itself, even though in the book concrete syntax is used, it is only to present the underlying algebra.

This is what I meant by *underlying* semantics. I do not dispute that the *concrete* semantics of C++, which includes overload resolution etc is complex and messy and has difficult corner cases, but arguing about "practical" aspects of it does matter. It's well known the C++ type system isn't sound, for example, but in practice this doesn't matter much: there is a sound subpart, and experienced programmers know how to avoid the other bits.

I'm not saying either of these things are good. But theoreticians have not succeeded in producing languages that have had much impact on industry and this is not just because industrial players are ignorant savages: its also because the theoreticians have no idea, or at least no desire, to produce production systems, with a couple of exceptions.

Whilst part of this problem is surely the industrial/commercial situation of universities (where most research is produced), it hardly explains why academics continue to persue highly advanced research goals when the knowledge of 30 years ago is still not utilised.

For example I learned about reentrancy in my first year of university study and I try to ensure my code is re-entrant, but vast numbers of systems out there are not reentrant. Yet my own C++ libraries are entirely re-entrant (except where it's necessary to bind to broken libraries like ISO C, Posix, OpenGL .. the list is enormous).

Surely, it takes some skill to use a messy language like C++ well but it can be done. I have no doubt skill is also needed to use languages with stronger theoretical backing, such as Ocaml or Haskell. Perhaps the hard cases in Ocaml are more related to program structure or performance, perhaps in Haskell trying to write comprehensible performance code by guessing how it optimises: correctness at the expense of performance is not really acceptable either.

Maybe ATS has the right approach: be able to write reasonably fast simple code and make the typing more precise progressively to obtain better correctness and performance.

Anyhow the bottom line here is really: I was a member of the ISO committee that designed C++. I fought, within my limited knowledge and abilities, for a better system, and failed. Almost all the 130 or so committee members were computer, compiler, or library vendors or major users. Where were the academics that actually knew something about computer languages?

It's actually not true one has to ultimately understand all the corner cases because running into them is inevitable. The Felix run time is written in C++ and doesn't run into many corner cases because I designed it so it didn't. For example it avoids order of initialisation problems by simply not using global variables. Order of micro execution is avoided by using well known idioms and taking care not to run afoul of the hard parts of the sequence point rules. Inheritance is used carefully to avoid slicing problems. Nasty overloading issues are avoided by using distinct names.

Of course the really hard bit of my system is the compiler and I avoid many problems with C++ using another method -- the compiler is written in Ocaml :-)

I have a famous book

I have a famous book by Reynolds on semantics, which is quite explicit in saying it is concerned with the algebra generated from the concrete syntax, and is not really concerned with the concrete syntax itself, even though in the book concrete syntax is used, it is only to present the underlying algebra.

Just to clarify, static semantics ≠ syntax, there are three separate parts to a programming language: concrete syntax, static semantics, dynamic semantics. Ideally, they can all be defined independently from one another, only sharing a definition of abstract syntax. Moreover, an algebraic treatment of semantics would require that both its parts are sufficiently compositional. None of these conditions are even remotely true for C++. So I don't see how it even makes sense to assume there is an "underlying semantics" that is somehow algebraical.

This is what I meant by *underlying* semantics. I do not dispute that the *concrete* semantics of C++, which includes overload resolution etc is complex and messy and has difficult corner cases, but arguing about "practical" aspects of it does matter. It's well known the C++ type system isn't sound, for example, but in practice this doesn't matter much: there is a sound subpart, and experienced programmers know how to avoid the other bits.

In my experience, a good half of the development time of an average C++ program goes into debugging crashes or related misbehaviour, even when you are careful. So the unsoundness of C++ matters a big deal in practice!

I've heard the myth about a sound subset many times. But AFAICS, no subset of C++ exists that is both sound and useful. It would have to exclude at least pointers, references, arrays, inheritance, unions, and casts. How can you write an interesting program without any of these?

I'm not saying either of these things are good. But theoreticians have not succeeded in producing languages that have had much impact on industry and this is not just because industrial players are ignorant savages: its also because the theoreticians have no idea, or at least no desire, to produce production systems, with a couple of exceptions.

Whilst part of this problem is surely the industrial/commercial situation of universities (where most research is produced), it hardly explains why academics continue to persue highly advanced research goals when the knowledge of 30 years ago is still not utilised.

This is a different discussion now, but as you say, the major reason is that the "business model" of academia usually does not allow spending substantial resources on building and maintaing production systems. So they move forward. Why would you blame them when they get 30 years ahead?

I wholeheartedly agree that the huge gap is really unfortunate. But having spend years in both academia and industry I still have no idea how it could be bridged. Neither side, at large, is provided with the incentives to make that a priority, or even a possibility -- it's quarterly numbers in industry, and publish or perish in academia. (Though I have to admit that I'm sometimes turned off by the amount of ignorance and even open anti-intellectualism in the software industry in particular.)

Anyhow the bottom line here is really: I was a member of the ISO committee that designed C++. I fought, within my limited knowledge and abilities, for a better system, and failed. Almost all the 130 or so committee members were computer, compiler, or library vendors or major users. Where were the academics that actually knew something about computer languages?

In fact, I know academics who were on the C++ committee, but gave up. And being a member of the Ecma JavaScript committee myself, I can feel their pain, and yours. ;)

It's actually not true one has to ultimately understand all the corner cases because running into them is inevitable. The Felix run time is written in C++ and doesn't run into many corner cases because I designed it so it didn't. For example it avoids order of initialisation problems by simply not using global variables. Order of micro execution is avoided by using well known idioms and taking care not to run afoul of the hard parts of the sequence point rules. Inheritance is used carefully to avoid slicing problems. Nasty overloading issues are avoided by using distinct names.

You can only design a program to not run into corner cases if you know what they are, and why you should stay clear of them. You may not need to fully understand them, but they still matter in practice. Unfortunately, following simple guidelines is not always possible, nor sufficient, and they tend to be non-compositional themselves.

Crashes and sound typing

Perhaps I misunderstand something here and I can be educated, but most crashes and misbehaviours in C++ have nothing to do with the lack of soundness of the type system. In fact I know of one case where the type system is unsound and most readers probably aren't aware of it, I've never heard of it being exploited either: during construction, you can leak a non-const pointer to an object typed as const, because in the constructor body the type of the this pointer is non-const.

Most misbehaviours are due to breaking dynamic rules, and languages with sound type systems can and do have these as well. Both C and Ocaml type systems allow you to access an array or string outside its bounds, in fact C has a stronger type system here than Ocaml: gcc 4.8 actually catches quite a few cases.

The C type system is stronger because the length of an array is part of the type, Ocaml does not include the length. With respect to arrays, both systems are sound and both lead to crashes if you do an out of bounds access. Ocaml has the nice feature that you can guarantee array accesses are bounds checked, bounds checking in C is also possible bit I'm not sure it can be guaranteed. However checking C++ vector certainly can be.

Perhaps I'm missing something! Consider another example: Ocaml ensures all variables are initialised before use, whereas C does not. This isn't an issue of soundness, but dynamics, and the soundness of Ocaml here isn't the same as correctness, in fact it can get in the way. In Ocaml you cannot forget to initialise a variable, but you can still cheat and put a dummy value. In that case there's a chance the improper initialisation will not be detected, whereas in C there's a good change its a NULL pointer and an improper use will be detected. In neither case is this related to soundness: C doesn't promise an object is initialised and it doesn't promise a pointer isn't NULL. The soundness of the type system just promises a pointer variable will be a suitable object, that is, 8 bytes long on a 64 bit machine, and suitably aligned.

If such a pointer variable were misaligned or only 7 bytes, then the type system would be unsound. [You can, of course, introduce unsoundness if you use casts]

There are well known issues in C++ like this one: because in C arrays silently degrade to pointers to the first element, and because in C++ upcasts from derived to base classes are implicit, an array of derived objects can be represented by a pointer to a base object, and an increment of the pointer can fail to access the second derived objects.

But this is not an unsoundness. In C/C++ you can increment any pointer variable, whether it points at an array of that type or not, whether it points off the end of an array, whether it points to s single object, or it is NULL, points to de-allocated object .. or is just plain uninitialised.

Unless I misunderstand, soundness is entirely a theoretical property of the type system itself: whether the rules of inference allow deriving the wrong type for something.

Of course I agree that C++ lacks compositional properties at all levels and the type system is crap, but soundness is only an indictator. Lack of expressiveness, to me, is a vastly more important condemnation (no sum types for example!).

Soundness

Type soundness connects the static with the dynamic semantics. It states that the static semantics provably prevents any invalid behaviour from happening at runtime. Now, there is some leeway in defining what's "invalid", but it at least has to imply a progress property, i.e., the unreachability of program states for which no rules exist in the dynamic semantics.

In other words, soundness (at least) rules out what the C/++ spec calls "undefined behaviour". That is, a language with a sound type system cannot crash -- or as Milner put it: "well-typed programs can't go wrong". And all the "dynamic misbehaviours" you list are in fact text book examples of glaring soundness holes in C++'s type system.

Of course, in low-level language like C++, always enforcing soundness may be overly restrictive, since some low-level techniques are very hard to type. So such a language is expected to have loop-holes. However, there is whole world of difference between tying these loop-holes to explicit and isolated escape hatches (Modula-3, OCaml), or having them all over the place, lurking in various seemingly innocent language constructs (C++).

Arrays are a good example: in OCaml they are sound, since it is defined to check bounds at runtime. C++ they are unsound, since it does not define what happens when you go out of bounds, but does not prevent it either, and in practice just happily dereferences random memory (plus, there is the well-known soundness hole in subtyping that you mention). The inclusion of array sizes in array types in C/++ does not really matter, since their type system mostly ignores them (GCC implementing some additional warnings on top is nice, but is neither sound nor complete nor part of the language definition).

Execution model?

You seems to be thinking of the execution model of the language(some sort of abstract machine model, maybe). I think this may be a good direction, but should be clearly distinguished from the more general notion of semantics, since semantics need not be defined this way.

Most languages assume some implicit execution model, and it is worthwhile to consider what that model is and how complex it is. Understanding the execution model takes cognitive resources, but keep in mind that once you grasp it, what you know applies to all programs.

One more thing. While an execution model or abstract machine has its own semantics, and we can translate to a simpler model (LC, RAM, TM or what have you), it does make sense to factor out this process since programmers typically only need to think about the translation from code to the abstract execution model. We don't have to think about actual real life CPUs most of the time.

So, now to C++: what I think people here said is that C++ is not really simple since the actual computational model it embodies, say the memory model and sequencing points, is actually more complex than a cursory glance at the language suggests. I think this is one place where the move from K&R C to modern C++ increased complexity. The underlying machine model became way more complex, even in places where the goal was to leave the "semantics" of the language stable.

Execution model

Again, when I say underlying semantics, I am thinking specifically of code *after* binding. This means all the lookup is done, so that is excluded, but what is left isn't merely execution semantics. Although the type system is used to do the lookup, type information isn't erased in bound code. So there is still type based static semantics left, as well as execution.

One must remember that in C++ correctness is only one goal: performance is another. The C++ Standard Library actually contains strong performance requirements, and programmers are vitally concerned about how, say, inlining is done, or whether tail recursion optimisation is performed.

In my opinion, theoretical results (theorems) and program optimisation are one and the same thing. So if there's a fault here it is the academics who have failed to explain how these two things are coupled, despite the well understood Curry Howard isomorphism (which basically says if you can prove it's right you also made it faster).

This is quite apparent in ATS though. Clearly, say, run time array bounds checks (which provide only dynamically assured correctness) can be eliminated using the dependent typing to prove an index is within bounds. You don't just get a nice early (compile time) proof of correctness, you get faster execution too.

Theoretical results

In my opinion, theoretical results (theorems) and program optimisation are one and the same thing. So if there's a fault here it is the academics who have failed to explain how these two things are coupled, despite the well understood Curry Howard isomorphism (which basically says if you can prove it's right you also made it faster).

Theoretical results can be many different things, and completely unrelated to performance. In particular, I don't see how Curry-Howard per se says or implies anything about optimisations. What you have in mind are probably certain program equivalences that correspond to program transformations a compiler might want to perform. That's an interesting class of formal statements you can make (and prove) about a language, but by no means the only one. Just take bare type soundness results as a different example.

Complexity is essential

The original request for simplicity is hard to achieve. I would like to illustrate why with a requirement for a procedural language.

What do we need? We need some base types. At least 20 or so lifted from C, since we care about size and performance. This immediately raises the issue of how we combine numbers of different types.

Now, lets look at data structures. We need at least products and sums. Tuples are convenient for small products but we need records with named fields for bigger ones. Then there's the special case of a product of all the same type but massive numbers: so we need arrays as well, particularly to get dynamic indexing.

For sums we need at least a simple type and norminally typed sums are popular, although Ocaml also provides structurally typed sums, and Felix also provides numbered sums.

So we already have 6 syntactic data type constructors, and we probably want nominally typed products as well .. and we haven't even started to think about subroutines and control. Since we want performance we not only need lazy evaluation for inlining but eager evaluation for closures, so we have at least three evaluation models (lazy, eager, and indeterminate).

Then, for a real language, we have to have coroutines, and we haven't even started to think about asynchronous I/O or concurrency.

And architecturally a language is heavily constrained by tooling, whether it be needing to #include header files, compile separate interfaces, or load shared libraries under program control.

Every production language needs all this stuff, although most have parts of it missing because they're TOO simple. Did I forget to mention parametric polymorphism? Overloading of some kind? Ways to modularise the programs?

Even a simple thought experiment on the requirements for a decent production language is inherently full of complexity and we haven't considered compatibility with other languages or older versions of the same language, let alone details of the implementation.

Yes, only some is accidental.

You're saying a lot of interesting things, and should be rewarded for it, but often folks don't follow up. I especially liked your replies to Andreas Rossberg and Ehud Lamm. But here I'll focus mostly on your reply just above. (I have something complex to say about an earlier post that might not be worth it.)

I grant some complexity is essential. I didn't think my original request was easy, just a good idea. Here I quote (and number) four of my original sentences, whose points I want to reference in my comment below:

1. If you want design to be simple, it must be a goal.
2. Boil things down to basic definitions where they work when applied to complex cases.
3. You did it wrong when it requires an expert to adjudicate normal discussion.
4. Ask others for help when you want to know if it's simple now, because your perspective is suspect.

Maybe you're just saying simplicity is hard, which I'll grant. But you also might be arguing the opposite of one of those numbered points: that aiming for simplicity is hopeless, that basic definitions are infeasible, that experts are needed to hold everyone's hands, and that you have a totally unbiased and objective view. If you agree with any of that, then I want to argue, but otherwise I don't think I do.

My thesis was that a median developer should have a pretty good idea what's happening, most of the time with features commonly used, as opposed to writing code by cut-and-paste without understanding what it really means. I think it's common for folks to write code without grasping consequences, and since that seems bad, we should aim to improve grasp of code by design, partly through simplicity. If you think it's hopeless for devs to understand what they're doing, that's too negative a perspective for me to accept.

Now I'll veer in a new direction and maybe change subjects in reply to your comment:

we have to have coroutines, and we haven't even started to think about asynchronous I/O or concurrency.

I've been thinking about that part a lot the last few years. In fact, with regard to programming languages, that's the only thing I've been thinking about. Lately I'm writing about an idea I won't post until there's enough to read. (I should do that instead of participating on LtU much any more.) I'm not sure it's a good idea, but a first prototype might be interesting. I started toying with explaining a bit here after Ehud's post about execution models. Basically this has to do with lack of specification in C's ABI, and how this let's you transform one ABI to another in C, to target a VM with async green-threaded coroutines.

Help

Actually I'd love to find a place where I could ask for help, not so much "is it simple" but "I think this can be simplified but I want to be assured by a theoretician before implementing a whole lot of stuff only to find belatedly it doesn't work". But there's no such place I know of to ask (at least where I could expect a polite answer :)

Yeah, getting help via feedback can be hard.

For basic things, like how clear your ideas are to a lay person, anyone near you can give feedback, if they find the general topic interesting at all. A peer at school or work who's not up to speed on your current focus can give useful reactions when you know something about how good they are at other things, for a bit of context. I'm usually most interested in what someone dislikes or doesn't understand. It's also good to know when one of your original priorities is a mystery to a person you imagined using a tool.

Actually I'd love to find a place where I could ask for help...

Well if you do something so technical and detailed it needs an expert to give good quality feedback, you can try attacking your effort yourself, then showing a reviewer the attack too, while asking them to try to make the attack more severe than you were able to manage. It might provide a slightly more enjoyable game-like quality. Something with a programming language focus might get some feedback here, though it's common to see only sparse postings about narrow topics. You should try to assure folks negative feedback is okay, and there's no reason to fear an adverse reaction from you, if it's not all positive feedback.

I'm leery of specific assumptions

I'm leery of specific assumptions about what's "needed". To me, there's always the tantalizing possibility that we're doing things all wrong, and some different approach would give us much better results. Just for example, you start by saying "we need some base types." I'm not so sure we want traditional types at all. Do I know what we might use instead? No, I don't. But I recall somebody's suggestion (a variant on a joke about "Is there a God?") that we might construct a super AI and ask it "Is the Riemann hypothesis true?", and be told "Yes, but you wouldn't understand the proof." My comeback would be, "Is there a simpler proof that I would understand?", and I rather think, if the AI were as smart as it thinks it is, it would answer, "I don't know."

leery

I'm leery of specific assumptions too and I'm not making them, just giving an example of how even some really basic requirements rapidly introduce complexity. As a matter of fact I have made these assumptions experimentally and produced a language based on them, though of course not the only possible one, and clearly these assumptions aren't the only possible ones either. I'm still weighing up by actual use which things could be eliminated, and trying to find ways to unify what's left.