The programming language of crash test dummies.

A lot of posts here are about how to show a language to be well-behaved for some definition of well-behaved.

How about the opposite? How do we construct a language which, while it has a well-behaved and consistent subset, also allows us to reproduce and demonstrate every kind of important semantic problem and ambiguity that a student needs to be aware of and understand? Or serve as a "direct" translation target for programs written in languages that had those problems?

By "direct" I mean we may need to define operations and locally transform syntax to the new language, but we wouldn't need to write an interpreter to achieve, eg, their problematic calling discipline or argument passing or exception handling or the way their dynamic environments interfered with their macrology.

Ideally, the constructions that are connected to known problems would form a "minimal set" -- ie, we would like to introduce the smallest number of such constructs possible to get a complete map of these problems, and be able to demonstrate each concept.

So what would you look for in a language intended to demonstrate *unsafe* behavior, especially in correct implementations of features that some reasonably smart people once thought were good ideas?

Ray

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I must be missing something.

I must be missing something. Have you not just described C?

Bingo

I admit I had precisely the same reaction...

Earlier versions of C

Earlier versions of C certainly had more problems than C89 and later. But C still honours lexical scoping, so it by no means exhibits the comprehensive in the "total list of problems students should be aware of". Tough to find a language other than assembly-like ones that do not honour lexical scoping. Need a language like Lisp that provides dynamically scoped variables, or perhaps stack languages.

Edit: basically, you need a language that can simulate just about any other language via an EDSL without looking syntactically too alien. So basically Lisp, or maybe Haskell.

Dude, he's obviously talking

Dude, he's obviously talking about JavaScript.

it's even worse than that

He was clearly thinking about Perl.

Not even close

Forth

Yes, clearly. C can't be used this way.

No. C is a very limited language. Just because it is notorious for allowing some kinds of problem, doesn't mean it meets the criteria I mentioned. Nor are the problems it does allow anything like a complete set. C is too simplistic for anything like a complete set to arise, other than by building interpreters for other languages in C. You couldn't demonstrate problems relating to call-by-name, spaghetti stacks, true dynamic environments, closures, continuations, or lazy evaluation in native C, for just a few of the more obvious examples.

Think about the direct translation thing; you can't develop direct translations of programs in other languages into C. You can compile to C, but that's a very general and indirect translation, not just a set of libraries plus local transformation of syntax.

C fails to be this language in another way, too; it doesn't have a well-behaved and consistent subset that is reasonably usable. Because it is nearly impossible to program in C without using assignment, you can't demonstrate programming without assignment, then introduce assignment and demonstrate exactly which problems arise that you didn't have without it. C doesn't have automatic memory management, so you can't introduce manual memory management and the challenges it presents. C doesn't have closures, so you can't demonstrate variable captures. C doesn't have a consistent default scoping strategy, so you can't introduce other scoping strategies in a controlled way. etc.

Also, C's limited collection of dangers is by no means enabled by a minimal set of dangerous features. The set of semantic problems that you can demonstrate with C are each enabled in dozens to hundreds of different ways by different interactions between language constructs. There isn't just one mutation operator, for example. There are dozens.

And finally, there is no clean separation of these features from such non-problematic functionality as could operate without them. For example, the humble for-loop cannot be accomplished in native C without mutation. C's design forces you to use mutation to get loops done, even though loops don't really need it. That's hardly a good way to show that looping can be accomplished without it and then introduce mutation to show what problems arise that you didn't have before.

So, no, I haven't just described C. C fails to have every single trait that I mentioned in the original post.

If anything, C teaches students that "you can't live without $PROBLEMATIC because it's mixed up in everything important!"

My objective, by contrast, would be to show first that you can, and then exactly which behavior causes the danger, and then exactly what the dangers of each behavior are, so that people with the experience of it could look at a language design and say, "you shouldn't use $PROBLEMATIC to achieve such-and-such construct, because you can easily do such-and-such without it, and $PROBLEMATIC leads to these problems which I remember actually experiencing only *after* we introduced it."

Ray

I don't think such a

I don't think such a language can exist. At least, the interplay between different features would have conflicting meaning. Consider a language where you can inspect everything and a language with parametricity. They can't coexist. You'd end up with a hodgepodge of features that don't cooperate or end up breaking every abstraction that you try to create.

lisp

You're talking about lisp.

Maybe Logo or Emacs Lisp

Maybe Logo or Emacs Lisp

I'm thinking Racket

I'm thinking Racket. The Racket runtime provides a flexible range of options for object systems, finalizers, continuations, semaphores, and pointer manipulation, and the project has downright gone out of its way to support pedagogical DSLs and one-off exploratory languages. I think new calling conventions could be used by redefining the #%app syntax for a given module, rather than building a complete compiler for each one. Of course I don't expect Racket to be a silver bullet here, but it looks like it could cover most of the cases discussed so far.

Racket does come pretty close...

I think Tom Lord was right that it has to be some kind of Lisp. Racket is built around a Lisp implementation, not coincidentally. With a lisp it's easy to provide the kind of syntactic abstraction that lets you make direct translations from other languages simple, provided only that your call semantics are a superset of both their call semantics and everything they can do with basic syntax.

A lispy language with lazy evaluation including lazy multiple return values, lexical and dynamic environments, first-class environments, nonlocal exits, context-sensitive syntax, and what you could call "first class macros" or more accurately "first-order functions" would probably do it, right?

Nuts. That's actually a fairly precise description of the language I'm working on. I suppose that confirms that I'm insane.

My motivation is mainly to create a "translation target." Which is to say, my objective is that code originally written in most other languages should be easily translatable into this one without significant global transformations or reduction of the abstraction level. In most cases, translation should be a simple matter of providing a runtime library for the language builtins, and rendering the abstract syntax tree of the original program in a form acceptable to the parser.

The problem with that, is that a program which "does the same thing" when rendered in a different language, does not, as far as humans are concerned, have the same value *unless* static analysis can prove as much about that program as it could prove about the original. I mean, it may turn out to have exactly the same semantics, but if you can't provide the same set of static guarantees, then you can't provide the same value that the code in the original language actually has to humans who value static guarantees.

It's fairly easy to declare that a module "fails" unless a specified set of static guarantees for that module can be produced. Such static tests as a module passes can trivially be made into requirements for all future versions of the module.

And patterns of cant-do-that in the original programming languages turn into patterns of doesn't-do-that in translated programs, which can at least in theory be leveraged for the same set of static guarantees. But this is harder.

In practice that's turning out to be much harder than I thought it would be because nonlocal restrictions have to be expressed and checked. All the "simple" ways to do this would multiply my set of primitives. Where in Scheme you have 'define' and it's very general for example, I need a variation that promises the thing defined will never be mutated if I'm going to translate a functional language. And since that's the well-behaved default, it splits the one concept into 'define' (implicitly defining immutable things) and 'define-mutable.' But that's not the end of it. After that you have to consider variations for things that can be mutated but only by code in particular modules, or code in particular object types, or only in particular scopes, and 'define' winds up split into dozens of functions for defining slightly differently-restricted things.

This is dire because restrictions on these objects need to be program-wide to have the same semantics, but I don't want to do whole-program transformations to achieve translations. And it's only half the problem. After that, I need to try to construct by static analysis an authorization proof for every use of 'set!' to ensure that assignment does not in fact violate the (mutability and static-type) restrictions placed by definitions, because the static analysis, in order to provide the same value to humans, has to have the same semantics in identifying (and, depending on the requirements for the local module, disallowing) assignments that could fail because of those restrictions.

I believe this is possible. But it's Hard. I have been 'stuck' for weeks trying to develop a suitable general way of explicitly expressing and statically checking restrictions within individual programs as opposed to having implicit restrictions across a whole language runtime, that doesn't require programmers to master a dictionary's worth of different versions of 'define' and/or 'set!'

Ray

grow vs. plan

I think "lisp" in some plausibly authentic sense is not really a language at all -- it's a programming environment. It's an anti-language in this sense: If lisp hackers find some programming concept that isn't neatly expressible in lisp they'll look for a way to neatly extend lisp to afford that programming concept. Commodity "programming languages" in contrast are fixed at specification. They are characterized by lists of things you can and things you can not do. You adopt a language for the convenience of what it does, resolving to live within the constraints of what it refuses to let you do. With lisp it is different: there is no essential difference between hacking within the language as you found it and hacking the implementation to extend the language with at least partial coherence. "Lisp" is a commitment to a language that refuses no feature -- it just doesn't have them all yet. "Lisp" doesn't even require that all features compose sensibly. "Lisp" is a programming environment in which programming language concepts can be brought forth as convenient, not a programming language in which a particular finite set of concepts defines the totality.

(That's why lisp machines ran lisp, common lisp describes behavior they have in common, and modern Scheme is not a lisp. So to speak.)

So if you're stuck planning out the ultimate lisp one strategy to consider is to instead think of your project as building a living lisp rather than a defined language. Fix in practice whatever you've got that works with the understanding that it's a lisp environment to be extended at all levels (within the initial dialect and below it) over time. You'll never win if you try to play the game of surveying all the other languages and try, at the outset, to build a unified basis set for them -- there will always be something you didn't think of or some things that don't clearly fit together neatly. Let your "primitives" multiply without bound over time rather than searching for a wholly grail of an imagined minimal set.

You have a point...

But constraints, traditionally enforced by the languages, have become an increasingly important means of providing structure and abstractions in themselves. As important, I would say, as variables and procedures.

Constraints have been left to language definitions (and scoped globally in languages) because we haven't developed ways to define and enforce (and scope) constraints equivalent in expressive power to the ways we can define and call (and scope) procedures.

So this is the thing you described, I guess; a hacker looking to extend Lisp to neatly provide a feature found elsewhere. I want to provide a way for programmers to define constraints on nonlocal modifications of data and have the runtime enforce them.

I'm making a few extensions to what previous Lisps have done. The argument passing and result return discipline is by far the most fundamental. It allows many abstractions that otherwise would require macros or whole-program transformations to be expressed as procedures. But it also enables procedures to do ridiculous things with and to the caller's environment.

In the absence of a set of constraints chosen by the caller and enforced by the runtime about what a callee is not allowed to do, the power of these function calls absolutely precludes on safety grounds a lot of useful programming techniques that would exploit the extended function calls: patterns like callbacks or client code, for example, become visibly insane when a callee can make a binding injection in your environment that may shadow an existing binding.

So a general mechanism allowing code to name and specify constraints on what other parts of the program are allowed to do, just as they name and specify procedures and other data, is a necessary complement to the extension of function calls. Without it the extended function calls don't deliver what they need to deliver.

static vs. dynamic enforcement

If I understand correctly, you are not looking for a perfect static type checking algorithm to reconcile caller constraints on its environment with callee actions on the caller's environment. You're looking for a nice way to dynamically check for callee violations of dynamically expressed caller constraints. (And then presumably domain-specific static checkers could eliminate the dynamic checks for some programs.)

Is that right? Is it just the caller's environment of interest or are you looking for something more general?

Right, dynamic enforcement to start with

Dynamic enforcement is do-able, and even straightforward. Most constraints (such as constraints on argument types) are expressible simply as local assertions. That model can be extended or made more complete by forcing a 'constrainable' action done by remote code to call code local to the constrained data, where such assertions can be expressed. Trivial (already done) for function calls, but needs the extension to cover mutation.

In practical terms, that means having each environment or binding contour contain a 'setter' that is called by any 'set!' affecting something bound in that environment. Constraints on mutability or type or who is allowed to set things, can be enforced as assertions in the local 'setter' routine. This works more generally than it could in most languages because a called function can see where it's being called from, (or equivalently you can have different locally-defined 'setters' for the same binding contour injected into the scopes of different possible callers) so it's possible to express fine-grained and arbitrary constraints on where something can be set *from* (insert foo about representing inheritance diagrams and private/protected/public data).

But finding a good way for the programmer to *express* what needs to be enforced and what other code it needs to be enforced against is taxing my brain. I seek four virtues: clarity, succinctness, explicitness, and generality. Combinations of two are easy, most combinations of three are hard, and any combinations of all four, if they exist, I can't quite envision yet. Then again, I'm handing them the means to do syntactic abstraction, so it seems likely that the basic syntax I'll provide is nearly irrelevant to the way it will actually be used. So I'll pick one virtue for this, probably succinctness, and abandon it.

After that comes the work of trying to get static analysis to, at least occasionally, return some result other than, "can't tell whether or not the assertions will be invoked by an attempted violation."

In fact, given that I want the programmer to be able to express even ridiculously specific constraints that no static language enforces because no analysis could determine them before runtime, I'm resigned to the fact that "can't tell" will never be eliminated from the possible results of any static analysis.

The static analysis isn't my first priority. I know it will never be perfect or total, and I know that static analysis will never suffice for all of the constraints that any general (programmable) system can express. I hope eventually though, that if the same constraints inherent in some static language are explicitly expressed in translated code, that a static analysis of that code will be able to provide the same guarantees that existed for that code in its original language.

Ray Dillinger

edit: typo