Practical Bits of Making a Compiler for a New Language

I'm designing (and hopefully implementing) a new programming language. Now, I've taken a bit of language-design advice I once heard from a wise old professor, which is: "Never add an entirely new feature to an entirely new language." Instead, I'm consolidating older, thoroughly-tested ideas whose time has come into a language suited for a domain that has yet to see such features.

The vision? An advanced, almost-type-safe imperative language with functional and object-oriented features for kernel and base-library level systems programming. The goal in creating this? To bring the convenience of high-level language features to low-level programming.

The very foundation of the language is:

1) Lexical scoping.
2) Imperative block structure based on statements.
3) A strong type system with a single exceptional case of code marked "unsafe" (because when writing a kernel you either allow unsafe code somewhere, or develop a type system that could probably win a Turing award).

The features I'm putting in are:

1) Module/unit files. See: Delphi, Python.
2) Lambda functions and full lexical closures. Every function treated as data comes with a closure by default, even when it has no closed-over lexical variables. When a function lacks such variables, the compiler can optimize out their storage, but the type system must treat them as the same! Tail-recursion optimization will be mandatory.
3) Exception handling as seen in Java, C++, Delphi, Python and practically every other modern language. Not having it is downright silly, especially when a clever compiler can allocate an exception object on the stack, thus allowing even the heap allocator to safely throw exceptions.
4) Templates, as almost directly stolen from D. Only types can be passed to templates.
5) Object orientation based on generic functions and specializing multi-methods. However, there won't be pre or post methods, and each class can only inherit from one other class (thus VASTLY simplifying inheritance-related problems). Generic functions and specializing methods are used to provide dynamic method dispatch.
6) The unification of tuples, structures, and classes. A class is merely a structure that can inherit from another class, have methods specialized on it, and have multiple scopes (private, public, protected, et al). A structure is merely a tuple with named members. I see no reason that I can't declare a tuple type in which only some members have names, and leave the language to provide indexes for all, then declare a new tuple type that inherits from the first, and finally specialize a generic function on that latter type.
7) Dynamic arrays, stolen right from Delphi. I will also include static arrays just for the convenience of allocating a known number of items off the stack when heap space often runs scarce at the lowest level.
8) Typed reference values. These function like pointers in that their implementation identifies an address in memory, but to the programmer they can either contain the NULL value or reference a known object whose address was obtained via a take-the-address operator. Objects (as in instances of classes) will be referred to by references only. I'm actually quite on-edge about this one as it will introduce the possibility of dangling references. The level of safety provided by an utterly pointer-less language would be nice, but so would not copying huge amounts of data to pass an object into a function.
9) Blocks of code marked unsafe in which pointers, pointer arithmetic, inline assembly and re-interpretive type casting all become legal. These should only be used for ultra-low-level functions such as writing to device registers, and a perfectly-formed unsafe block will make the compiler emit a warning unless the programmer also marks it assumed safe.
10) Let blocks for variable declaration. Sod just declaring a new variable out of the blue, we should be marking where our variables go into and out of scope.

Features left out as deliberate design decisions are:
1) Type inference. It requires writing an inconveniently extensive type system and disallowing either references or mutable variables.
2) Garbage collection. You guys will hate me for this (because everyone and everything has garbage collection nowadays), but when writing an operating system or a device driver one can't find themselves left at the mercy of a GC, either for performance or safety reasons. For these domains you need a language with such a small runtime-library you can write it in 3 pages of code, or none at all.
3) Looping constructs. In the presence of tail-recursion optimization and templates, they simply aren't needed. Let people write functional code!
4) Concurrency. Once again, you simply don't assume certain language features at the lowest level.
5) Implicit type conversions. The only working implicit conversion will be the assignment of an integer to a floating-point variable.
6) Operator overloading. It makes parsing that much harder.

Now, the things I need your advice about are:
1) Syntax. Should I stick with C-like syntax for familiarity, switch to a Lispy s-expression syntax for metaprogramming capabilities, or use some completely other syntax I haven't thought of?
2) The references. Is not copying data really worth aliasing bugs?
3) Any advice on the implementation of generic-function dispatch, particularly in the case where methods might only become known at link time? I've got a system that will work fairly well already, but it only works when the programmer can only specialize on a generic function in the same module that declares and implements that generic function. Still, assuming that, it can do some work to type-check and ambiguity-check the generic-function call at compile time, resulting in only a small list of the type-safe possible dispatches to be looked up at runtime.
4) Do you think I've completely missed something?

I've got a textbook on compilation techniques and I'm raring to go at this, so please write in any ideas or criticisms you might have. Except about the mandate of garbage collection. I refuse to have my thread scheduler pause for a GC, though I feel quite happy about having one for writing desktop applications.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

As for pointer safety and

As for pointer safety and dangling references, you might want to take a look at Cyclone.

Jackpot!

Cyclone's region-based pointer safety is just the thing to make references much, much, much safer! Thank you! Since you can only get a valid reference by either taking the address of a variable or using a heap-allocation operator that can see where it allocates from, annotating references with regions in the compiler should work.

Reading papers now...

Unfortunately not

The pragmatic experience with region inference was disappointing.

Really?

Can you describe anything you might have managed to learn as a result? The Cyclone papers don't seem to have focused on learning from failures ;-).

Unfortunately not

My comment is based on anecdotal discussions with various members of the Cyclone team. The sense is that region annotations are either trivial or impossible in the eyes of most users, and the sheer volume of annotation involved became insurmountable.

But I think that this is properly a question for Trevor.

I've always wondered whether

I've always wondered whether some useful, ancillary construct could be used to express resource usage intent and inform region inference. For instance, adding "processes" to a language could inform which regions are shared, which are uniquely owned, etc. In this way, it might be clearer how to assign regions, but that solution falls out "by accident", and not by manual annotations.

By having the user break up their problem using ancillary constructs to provide other advantages, ie. mobility, fault tolerance, etc., resource usage information becomes more precise. Anyone have any links to research on this? I'm sure someone's at least thought of this before.

Too many paradigms

IMHO, the scope you defined is too ambitious.
If you really feel you are ready to implement all that, I suggest carefully planning the scope.
Two approaches I see are:
1. Feature-by-feature implementation, start with core features, release often, add new features based on your own feedback.
2. Create a lot of mostly independent pilot projects for each coherent set of features you want to ultimately implement.

While the first approach will give you something that runs some real code earlier, the second has a potential of giving you a deeper understanding of language design.

Of course, there is the whole spectrum of approaches in between these two extremes; depending on your goals - either to hack up something that makes your everyday job easier or to grok the language design and maybe someday design a new killer PL - you may adjust your plan of attack.

Further, if we consider other motivational dimensions: your goals may be, for example, to study efficient code generation for various platforms, or designing custom hardware for your language, or designing your language for specific hardware, or efficient distribution of code, or a cool new versioning and upgrade idea...
Without knowing all the little details of your motivation, it's hard to give an informed suggestion.

Scope?

Well, removing looping construct and GC removes me of the people interested in your language but:
- for exception, provide the D 'scope' construct, it's useful I think.
- for the syntax: IMHO, Scala has currently one of the best syntax (except that it doesn't have active string I think).
- for the type inference: local type inference is good enough, would-it be simpler to implement?

OK then.

Well, removing looping construct and GC removes me of the people interested in your language but:

Honestly, what's so bad about not having looping constructs? I could add them back, but they're superfluous in the presence of tail-recursion optimization.

Looking into local type inference and Scala's syntax. The D 'scope' construct is definitely a good feature to have, and I'll look into adding it when I'm done with the basic set of features already in the language (which may well be too large, hence not including loops, GC, or general type inference).

Loops not superfluous

Yeah. That's what we thought too. Then we noticed that tail-recursive lambda loops are inner procedures. Any simple compiler (bitcc qualifies, and the front end is approaching 27,000 lines of C++) is going to flatten those into top-level procedures and introduce a closure construction. Nothing logically wrong with that, but see my statements elsewhere on this page about type-partitioned heaps and semi-static allocation. In summary: inner procedures bad in high-performance kernels.

Now a more sophisticated compiler can certainly get around that issue, and there are many compilers for other languages that do so. Problem is: the compiler isn't our end goal; the safe system is. Good compilers for LISP, Scheme, Haskell, ML, etc. took a long time to evolve. The situation is better now that things like LLVM exist, but we expect that a really good compiler for BitC is going to take a while. In the meantime we want to build systems. Even when the compiler gets to be sophisticated, it remains important that the specification of "things that are guaranteed not to allocate" NOT depend on implementation-specific optimizations (or their absence).

Then there are some issues around break/continue/return. Break-with-value can be implemented using exceptions, but you need polymorphic exceptions to do it, and those have a nasty way of becoming un-catchable if they manage to escape the scope of their definition. And things can get strange very quickly if you have first class procedures and lexically scoped exceptions, because suddenly those exceptions have dynamic extent and you need to disarm them, which becomes a problem if you want to use break-with-value in a pure language subset. Some discussion here under the subject line "Error in specification of DO", end conclusion was that we would simplify our lives a lot by making the core iteration construct first-class.

There aren't any good (i.e efficient) techniques for compiling general tail recursion to C if separate compilation is a goal (and it should be). You may want to consider adopting the pragmatic restriction on tail recursion that BitC adopted. Gets 99.44% of the benefit, but compiles to C nicely.

One of the hugely hard things about designing a safe and useful systems language -- and I'm not claiming that we got this right in BitC -- is that you need to simultaneously get the semantics right and the pragmatics right. There are things dictated by pragmatics (representation, mutability, semi-static allocation) that many PL practitioners view askance, because they manifest as seemingly arbitrary (from the formal perspective) restrictions in the language design, duplication in the core semantics, or as plain old icky warts. None of that is good, because it makes the meaning of the program progressively harder to specify and reason about formally. Thankfully that's only important if we care to be able to analyze whether the kernel is correct in even a limited sense. But if we didn't, we wouldn't be doing BitC...

Point taken.

Point very taken.

Errr... on further thought... When you say:

Yeah. That's what we thought too. Then we noticed that tail-recursive lambda loops are inner procedures. Any simple compiler (bitcc qualifies, and the front end is approaching 27,000 lines of C++) is going to flatten those into top-level procedures and introduce a closure construction. Nothing logically wrong with that, but see my statements elsewhere on this page about type-partitioned heaps and semi-static allocation. In summary: inner procedures bad in high-performance kernels.

Honestly, most of the times I remember introducing an inner procedure it ran something like this:

(define (reverse l)
(define (do-reverse l result)
(if (null? l)
result
(do-reverse (cdr l) (cons (car l) result))))
(do-reverse l '()))

I can't guarantee this isn't a quirk of my coding style.

The outer procedure tail-calling the inner procedure just exists to introduce the initial value of one or more arguments to the inner procedure (often, as here, an accumulator). Now correct me if I'm failing to see something here, but why would a compiler generate a closure construction for an inner procedure (or a flattened inner procedure) that doesn't close over any variables?

OTOH, any non-smart would introduce one extra stack frame for that code even without a closure.

My approach to solve this would be to, like you did with BitC's tail recursion issue, take a pragmatic path. Since my language doesn't have a letrec construct (it isn't Scheme-based), I would add a special syntactic construct for writing procedures like the above whose real implementation should be:

(Pseudocode)
define f(x,y) {
define do-f(x,y,z) {
...
return do-f(new-x,new-y,new-z)
}
return tail-call do-f(x,y,first-z)
}

Where "tail-call" means only allocating new space on the stack for z (or just adding z to the current environment frame, more Scheme-ishly), and then jumping directly into do-f without pushing a return address. Though ugly in Scheme syntax, it would look fairly clean in C syntax:

List reverse(List l) (result=LIST_NULL) {
if(l == LIST_NULL)
return result;
else
return reverse(cdr(l),construct_list(car(l),result));
}

Putting aside the syntax, the idea is to overload the function with one form recognized in the function's own scope, the tail-recursive function that requires a "result" argument, and one defined outside the function's scope, that being the one that just allocates the first value of result and jumps into the tail-recursive function.

This, coupled with the BitC move of only optimizing tail-recursion for a call determinable as tail-recursive at compile time, wouldn't cover the general case of tail-recursive inner procedures, but IMHO it would pragmatically cover enough cases to make the use of tail-recursive functions instead of added looping constructs practical.

Some kind of defrec (DEFine RECursive function) construct with static-only tail-recursion optimization still leaves a cleaner language than one with general recursion, while, and do-until all in use.

As for break, continue, and return in BitC: I actually agree with you on your situation, except for one thing: IMNSHO, under no circumstance should it be at all possible to capture loop context for break/continue in a closure! Once you allow that, you've actually made the programmer's life more complicated than if you had given them first-class continuations to work with.

Oh, and please excuse that I haven't figured out how to make indentation show up in code tags yet.

Edit: OH SOD IT, I'll just throw a couple of kinds of loops in for pragmatism's sake.

Or I'll just use Dylan-like syntax like I'm leaning towards and put loops in as macros.

Why would a compiler

Why would a compiler generate a closure construction for an inner procedure (or a flattened inner procedure) that doesn't close over any variables?

Well, by and large it wouldn't, but unfortunately loops don't tend to fit this description. Most loops make reference to at least one variable appearing in a lexically enclosing scope. That constitutes closing over those variables.

What is called for here is escape analysis showing that the inner lambda does not escape either inwards or outwards. At that point it doesn't need to be lifted. We haven't yet done that analysis pass in BitC. It's in the queue, and it would make a bunch of things much simpler, but it still wouldn't address the mess we just encountered with labeled blocks and escape-with-value.

IMNSHO, under no circumstance should it be at all possible to capture loop context for break/continue in a closure!

Turns out that doesn't work. Even a simple loop has to close over the escape-with-value label so that it can escape even once. Also, you would like an inner loop to be able to do an exit-with-value from its outer loop. Now if loops are first class there is no problem with any of this, but if loops are implemented by lambda and tail recursion, then you're kinda stuck.

What you really mean to say here is that they shouldn't be captured in closures of escaping procedures. And I definitely agree, but that brings me back to the fact that we haven't done the escape analysis pass yet...

Closure construction

Yeah. That's what we thought too. Then we noticed that tail-recursive lambda loops are inner procedures. Any simple compiler (bitcc qualifies, and the front end is approaching 27,000 lines of C++) is going to flatten those into top-level procedures and introduce a closure construction.

Is that because you are targetting C or is it a general thing?

I am planning to pass a "context" argument to my first-class functions that points to the closure or, in this case, to the stack frame of the function containing the loop. That gives access to the variables defined before the loop. What do I miss?

Context Arguments

AFAIK, you shouldn't actually miss anything. It just requires finding the frame pointer of the parent procedure, something technically impossible in C without assembly code (as you well know, of course). Add in the fact that their compiler would need to recognize the special case of an inner procedure definition whose resulting function never gets passed up the stack and compile for that, and I guess they just decided that adding looping constructs was easier than re-architecting their compiler around the need to generate inline assembly and optimize for one special case.

This is, of course, mere speculation on my behalf. I welcome Dr. Shapiro's corrections

Tradeoffs.

In a stack-based

In a stack-based implementation, you don't want to just take a pointer to the containing frame, because the inner procedure can escape and the containing stack frame can then disappear. Instead, you generate explicit code to populate an explicitly heap-allocated closure record.

You don't want to copy that frame naively into the heap, because this leads to false garbage retention. Andrey Appel did a nice paper on this, but the name is not coming to mind at the moment.

Once nice thing about the current BitC implementation is that each pass is a BitC->BitC conversion. There is a --dumpafter option you can use to see what the output looks like after each pass. So you can feed in a small example and watch the transform happen, which is occasionally a cool thing.

Safe-for-space closure conversion

You don't want to copy that frame naively into the heap, because this leads to false garbage retention. Andrey Appel did a nice paper on this, but the name is not coming to mind at the moment.

I'm guessing that's either Efficient and safe-for-space closure conversion (2000) or Space-Efficient Closure Representations (1994), both of which he coauthored with Zhong Shao.

Closures and CPS

Thanks, I think I will need to read that as it is not yet clear to me how to manage multiple stacks. I am using CPS as a "native" form in dodo.

I think the main issue is

I think the main issue is that escape analysis is subtle, so the conservative thing to do is to lift all inner procedures to top-level procedures that take an explicitly constructed closure. That is what we do. The only piece of target-specific assembly code we have is the code for MAKE-CLOSURE, which heap allocates an executable code block for the procedure object.

The easy alternative to heap-based procedure objects is to simply put a closure pointer into every procedure, but this doesn't interact well with cross-language linkage.

As a practical matter, the only restriction that we really incurred as a cosnequence of going to C was the restricted form of tail recursion that we adopted.

Local type inference is well

Local type inference is well worth having and lightweight.

If I were you my approach would be: decide on a subset of the language (or even a bytecode) on which to base the rest, then define everything else using that subset. That will help you keep features in check and simplify implementation. I am taking this approach for dodo.

As for concurrency, I would recommend including it using the share-nothing model (consider that separate threads are separate processes communicating via signals and messages).

If you do without garbage collection, check if the Objective-C approach of allocation pools works for you. That is simpler and requires no runtime, however it is not that easy to use (obviously).

For the syntax I think that Python has quite a nice syntax, the only problem is that it relies on whitespace. If you can correct that it would be a good base.

Re: Local Type Inference is Well

Local type inference is well worth having and lightweight.

I'm looking into local type inference, since from the look of it local type inference can work better for polymorphic functions than templating. OTOH, can local type inference work for templated classes? If I want to write, say, a hash table, is there a type-inference algorithm that lets me specify the storage and methods of the hash table while leaving the actual type stored in the table as an inferred type variable?

If you do without garbage collection, check if the Objective-C approach of allocation pools works for you. That is simpler and requires no runtime, however it is not that easy to use (obviously).

Do you mean reference counting with autorelease pools? That's all I've found on the web when searching for "Objective-C allocation pools".

My plan as it stood was to include two safe allocation operators: one that casts a static array of bytes to a dynamic array of bytes and initializes the dynamic array with the correct length value, and one construction operator that constructs a class instance or a dynamic array of an arbitrary type from a dynamic array of bytes, shortening the array or throwing an exception if the array doesn't have enough space left.

This could work quite well with the Cyclone region idea mentioned above. Each static byte-array (or pointer to bytes in marked-unsafe code, ex: page returned from sbrk()) cast to a dynamic array of bytes would be considered its own region by the compiler. Coupled with each lexical scope having a region and, possibly, each dynamic array of bytes (should probably just call that a "heap" type) having its own lexical scope of valid use (as in Cyclone), providing reference-safety without a GC can definitely work.

As for concurrency, I would recommend including it using the share-nothing model (consider that separate threads are separate processes communicating via signals and messages).

I really, really, really don't think so. Honestly. Kernel programming is complicated enough when you have to consider how the OS you're coding implements concurrency/parallelism. Why make the programmer consider their language's model below and their OS's model above?

Yes the proper name is

Yes the proper name is autorelease pools.

I would use a named type for a template, even if it is a generic type.

Finally do you know many kernels that are not multi-threaded?

Multi-threaded kernels...

Get that way by the will of their programmer. I've seen enough varying threading models in different kernels, from old-school Unix processes, to QNX threads, to Plan 9 share-some-stuff processes, to Linux threads and more, to know that you don't tell a kernel programmer what their threading will look like. EVER. They/we hate that like nothing else. Really, and we haven't even gotten into scheduling models. How do you implement Lottery Scheduling or Real-Time Scheduling when your language uses Prioritized Round Robin? It's a bloody PITA to have your language specifying "for your convenience" the exact thing you intended on programming, but with a different algorithm that provides different benefits and incurs different costs.

On the type inference... I think I'll *just* implement templating of types and functions for now. Unless local type inference can somehow save my compiler from generating a new, template-substituted function for each unique instantiation I can always add local type inference once the language actually works.

For a concrete look at one implementation...

Eli: given the state of the current comments (or more precisely, lack thereof) I hesitate to suggest this, but you might want to look at the BitC type inference implementation. It's basically a classic Hindley-Milner engine extended to type classes and abstraction over mutability.

This week our tree is bouncing around a lot as we gear up for "release -2" and we resolve some dangling clean-up issues, but if you think it might be useful to have a look, the mercurial repo is here.

Thanks.

I'm reading the papers on mutability in BitC and will go through the code when I have time. Mutability was really my chief worry in declining to include type-inference. I know it's theoretically possible, but when I said "Turing-award winning type system", I meant it probably will win an award if it makes full and unlimited (a la C) mutability with full, annotation-less type inference actually *work*.

Well, since it works...

Somebody owes Swaroop a Turing award. :-)

Works is good, but works and formally proven is better. Swaroop has done both.

For the syntax I think that

For the syntax I think that Python has quite a nice syntax, the only problem is that it relies on whitespace. If you can correct that it would be a good base.

I always thought Python's syntax was so nice because of the whitespace. You would have to introduce curly brackets or similar and rely on semi colons to correct the dependency on whitespace. This will end up in C-like syntax, which is more verbose and uglier, but easier to implement.

Dylan, Goo

You might want to take a look at Dylan and Jonathan Bachrach's Goo, which share some of your goals and features.

Both languages have restartable exceptions (inspired by Common Lisp's) that leave "Java" exceptions behind. Restartable exceptions are not much harder to implement than ordinary exceptions and AFAICS don't incur any additional runtime overhead (although there are some tricky details to work out).

BitC and PreScheme

I would pay close attention to BitC and PreScheme.

FYI, it isn't true that you need a Turing-award-winning type system to do "systems programming;" see Lightweight static resources, for example.

... but sound and complete inference over mutability helps

As Paul knows, one of the challenging issues in BitC has been developing a sound and complete type inference mechanism that works right over mutability. Swaroop finally has one, and the publication has just been submitted. As always, we'll have to see what the committee thinks. I suppose they will let us know.

While Eli seems disinclined to adopt type inference, we actually feel that it's a very important thing for a safe systems language. More precisely, the ability to express certain kinds of abstraction over types is important, and once you introduce type abstraction you need to resolve the type variables somehow -- which rapidly pushes you down the type inference path. Getting there wasn't half the fun, but now that we're there I wouldn't go back.

The two remaining items that we need to shoe-horn into BitC to have it be fully functional as a systems language are existential types and single inheritance for structure types (including the obvious subtype support). From where we now stand, the subtyping support doesn't look problematic. Existential types may prove more complicated.

Certainly not a turing-award winning type system, but the more help you get out of your type system the better off you are, and the complexity we bit off does seem to be paying off. The trick is not to take the big leap over Pierce's Cliff (see "Type Systems Considered Harmful", previously discussed at LTU here)

You don't have to be crazy to develop BitC... :-)

shap: As Paul knows, one of the challenging issues in BitC has been developing a sound and complete type inference mechanism that works right over mutability.

I take this to be Jonathan's polite way of saying that there is novel, perhaps even Turing-award-worthy, work showing up in BitC. I should definitely clarify that my comment about "You don't need a Turing-award-winning type system" only applies to the amazing resource-aware computing effort of Oleg Kiselyov and his colleagues that I wished to draw attention to; it is not intended to imply that no Turing-award-level type system can, or should, be designed to address the problem. Indeed, Jonathan is correct to observe that I continue to follow BitC closely, and am very much looking forward to the next release, as every tarball I've pulled down has failed to make due to a missing "ASTBuilder," which appears to be a tool that is not present and not built in the BitC tarball. My guess is that it's vestigial, stemming from the separation of BitC from the larger Coyotos effort, but I haven't had the time to delve into that hypothesis.

Garbage collection

I certainly don't think anyone should hate anyone who is suspicious of GC because they think it might be problematic in systems programming. It's a fact of many kinds of task one might wish to perform there that one needs to have a kind of understanding of execution costs that GC obscures.

It's not, however, the case that eliminating GC from the language spec magically means that the costs of resource management will be transparent. In fact, if you make use of closures and first-class functions in your code, it is a fair bet that understanding the performance will be hard work.

I'm guessing that if you agree with the above, you'll see that the BitC that Paul recommended above doesn't fall too far short of your wishlist.

Not consistent with my experience, at least

My experience implementing several operating systems is that GC is not a problem in a well-structured OS kernel, because well-structured OS kernels perform a semi-static initial partitioning of memory into a type-partitioned heap and then implement free lists over those types by hand. In such a system, GC never triggers because nothing is ever allocated after than initial partitioning.

Supporting this idiom is why the bitcc compiler has a --no-alloc construct. It lets us statically confirm that code which is not supposed to allocate anything on the heap really doesn't.

Whether BitC is the right solution definitely remains to be seen, but we're at least reaching the point where I can see the headlight of the on-coming train. :-) Watch this space for news shortly.

You're Mostly Right

Oftentimes kernel coders *do* find it useful to pass a few smaller data structures around, though the "big" architectural ones do get allocated in big static blocks.

But those smaller data

But those smaller data structures are almost universally stack allocated.

Observation: using malloc and free without guaranteed per-type low watermarks in the heap, and a further guarantee that those low watermarks are guaranteed to ensure forward progress, is reliably crashworthy. This is hwy all major kernels today use a (sometimes ad hoc typed heap allocator. It is also why all critical system kernels use [semi-]static typed heaps.

Triggering GC

In such a system, GC never triggers because nothing is ever allocated after than initial partitioning.

I think what I wrote was not quite what I meant to write: I meant to say that you cannot make unrestricted use of higher-order programming techniques in performance critical sections of code because you will risk triggering either GC or whatever other expensive memory management technique is needed to handle closures.

Out of interest: does BitC enable any static checks to ensure that certain sections of code can't trigger GC?

bitcc --noalloc

If the --noalloc flag is given at the compiler command line, the compiler will flag as an error any contract that would implicitly or explicitly result in a heap allocation.

This is not a mandated language feature, but it's certainly something that any sensible compiler should implement. We originally intended to have a (noalloc expr) form, but that can only be vaildated on a whole-program basis, and we're not there yet.

(noalloc expr)

We originally intended to have a (noalloc expr) form, but that can only be vaildated on a whole-program basis, and we're not there yet.

Thanks, that's what I was asking after, yes.

Local Type Inference

Is commenting on my own post like this bad form? Oh, well, won't do it again if anyone says so.

I've got a question: does local type inference mean inferring the types of variables based on their use only within their scope without performing the full procedure of assigning type variables at scope boundaries and later performing unification?

If so, I figure how it works is that programmers could annotate on procedure boundaries: arguments and return types, and of course global variables (BAD FORM, those). From here, local type inference can infer the types of resulting local variables. I'll actually have local variables declared with a Scheme-like let statement (because I find variable shadowing the most elegant way to locally override a value), so from the outermost scope of the procedure the type inferrer can determine the types of variables declared in the let, and from there recurse into the scope of the let block with no need for type variables.

If I'm not totally off my rocker here, you guys have made a convert to the Holy Church of Not Declaring the Obvious for the Compiler's Sake. I still don't want to include full type inference however. From my experience in type-inferred languages (a little) and my reading of the theory, it has a tendency (much like a dynamically typed language) to catch type errors somewhere far away from the expression/statement that actually created them. While this doesn't appear a fundamental problem, I feel I should wait a good few years to see research advances before deciding that everything possible should be fully type-inferred.

Fully-declared types at library boundaries (ie: procedures whose types are fully known at their declarations) provide a much simpler basis for a foreign-function interface, anyway.

Don't include `statements'

2) Imperative block structure based on statements.

Having statements in the language means there are two kinds of continuations, those expecting a value and those that do not. This can make the language awkward. For instance, it is often the case in C macros that you need two forms of each macro, one for the `void' case, one for the rest. Having `try/catch/finally' be a statement was a huge mistake in Java because it means that you cannot wrap a `try' around an expression, nor can you `continue' out of a catch with a value. You have to rewrite your expressions as statments and explicitly name the temporary variables so that you can cross the block boundaries.

Just make everything be an expression.

you cannot wrap a `try'

you cannot wrap a `try' around an expression, nor can you `continue' out of a catch with a value. You have to rewrite your expressions as statments and explicitly name the temporary variables so that you can cross the block boundaries.
...or put the whole block in a function and use "return". Is it that bad?

I understand your point, though.

Not bad...just awkward

...or put the whole block in a function and use "return". Is it that bad?

It's not bad, but it is more boilerplate.

Here are some concrete examples in Scala. I picked Scala here because it supports as much imperative programming as you want in a very C/C++/Java-ish way, but it's expression oriented which allows the kinds of constructs that Joe was talking about.

In this example the try block is an expression that evaluates to an object of type Some[Float] (I assume y and z are Floats). The catch block is a bit like a partial function from Exception to None; case is how you write pattern matching in Scala. In typing the whole try/catch expression the type inferencer unifies Some[Float] and None into Option[Float].

val x = try {Some(y/z)} catch {case e:ArithmeticException => None}

Note that the expression based try/catch doesn't preclude the user from using it as a statement like so:

var x:Option[Float] = None
try {x = Some(y/z)} catch {case e:ArithmeticException => () // do nothing, already taken care of}

In the above code, what appears to be statement (x = Some(y/z)) is also an expression that returns the unit value, which Scala syntax represents as (). It's basically the same idea as void, except that it can be treated as a value. The compiler can optimize away most uses of unit into void style code.

Finally, it also doesn't preclude following your suggestion of defining a function with explicit returns. In this example these returns are redundant, but I wanted to contrast this approach with the original expression based approach.

def divide(y:Float, z:Float) = try {return Some(y/z)
} catch {case e:ArithmeticException => return None}
val x = divide(y,z)

Hopefully all of this shows that even the most imperative of languages can benefit from being expression based. Users who think in terms of statements can certainly use the language that way. But users who get comfortable with expressions will often see their code get compressed.

Pick your style

In the language I am working on (dodo) you can chose the boilerplate way, which I expect to be the most frequent, or the short expression-based way. The imperative style is transformed in continuation-passing style. There is a single catch block per function, which contributes to writing more functions. An example of expression-based code:

def x = (y / z) ->| () null

-> is for the default continuation, here it is not used (simply returns the value.)
| is for the alternative continuation (which will be reached in case of division by zero.)
() is the list of parameters of the continuation

I am missing something obvious

Statements can be modeled as expressions returning Unit, so I'm not sure why there is anything interesting here.

Given that Unit is often *represented* as void for purposes of cross-language compatibility, there is some hair here, but I don't see that it is block-vs-expr hair. I'm surely missing something obvious, so I wonder if you might expand on this?

Let's look at Java. Java

Let's look at Java. Java has both statements and expressions and these are built in to the language at a very low level. The bytecode verifier places constraints on the instruction stream and one important constraint is that the stack level can be proven to be independent of the control flow at the control flow join points. That is, wherever control flow merges, the stack level has to have the same depth no matter how you get there.

Some operation in Java are guaranteed to change the stack level, others are guaranteed to leave it alone. Function calls always leave the return value on the stack, so the stack should be deeper after a call instruction. Statements always leave the stack the way it is.

Semantically, this means there are two kinds of continuations at the bytecode level: ones that expect a value to be pushed and ones that expect the stack to remain unaltered. These are expression and statement continuations respectively. They cannot be interchanged. At the bytecode level, you must not attempt to pass a value to an expression continuation (i.e. change the stack level at a control flow join), or attempt to avoid passing a value to an expression continuation (i.e. not leave a value on the stack when one is expected). They bytecode verifier enforces these restrictions.

The Java source language helps out a little bit. If you use an expression in a statement context, it is assumed that you wish to discard the value and a pop instruction is emitted to ensure the stack stays level. And it is syntactically impossible to use a statement as an expression (although one could certainly imagine an extension to the language to permit this).

So what's the problem? Statements and expressions are no longer interchangeable. Suppose we had a function foo that had some complicated mathematical expression in it. Maybe something like
x+y**2 + sin(x)/x Now you find that you get a divide by zero error
when x = 0. If we had catch expressions, you could easily wrap the offending expression (maybe like this):

x + y**2 + (catch (sin(x)/x) on-error 1)

When sin(x)/x threw a divide by zero error, you just toss it and use
the 1.

But catch isn't an expression, it is a statement. So you have to pull apart the code and rewrite it:

float term3;
try {
term3 = sin(x)/x;
} catch ... {
term3 = 1;
}
... x + y**2 + term3 ...

And since the expression can be arbitrarily deep, you could end up with the refactored code arbitrarily far away from where it is used.

Note too that since try/catch is a statement, you cannot return a value from it. We have to introduce a temporary variable and assign to it.

Try/catch would be more useful as an expression, but because it is a statement, we have to write `adapter' code to get around the problem of having to use it in an expression context. Now this is unpleasant, but it gets worse.

Throw requires a value. That is, the continuation that is established by the TRY clause is an *expression* continuation! TRY is a total mess: it can't be used where it is needed most, and it doesn't capture the continuation at the point of use. The CATCH clause is a mess as well. It expects a return value, but because it is part of the TRY clause, it appears as a statement, so it cannot return a value itself. TRY and CATCH are expression-oriented constructs (they establish expression continuations and take values) that are wedged into statement context (where they are not allowed to return values) and the user must write the adapter code in the form of temporary variables in order to fix this mismatch.

Now you might think this is a small inconvenience, but if you are compiling from a language that has catch expressions into Java or into Java byte code, you can't just recursively descend your flow graph and generate the code. You have to hoist the catches up to the nearest statement set and arrange for the appropriate temporaries to be introduced and patch up the control and data flow for this sort of idiocy.

The alternative of simply making *everything* an expression (that computes a value, whether you use it or not) is simpler. Of course you can have syntactic statements, where the return value is discarded, but that doesn't have anywhere near the problems of a semantic division between statements and expressions.

This doesn't smell right

I see what you are getting at, but I think there is an error in your analysis. The nice part about declaring that "syntactic statements are merely expressions returning unit" is that unit has no representation. Viewed from a certain perspective, this is why Java statements do not modify the size of the stack. The fact that catch is syntax doesn't alter matters a bit. The example you give expresses a purely syntactic artifact of the Java specification. I agree that the syntax does impose the restriction you identify, but this does not preclude treating all statements as expressions returning unit -- including try/catch.

In fact, I can show you a meaning-preserving rewrite from Java statements into pure expressions. So I do not buy your point about distinct types of continuations either. Your continuations that do not expect a value are merely continuations that expect a unit value that is elided at the level of representation.

I agree with your last bit: transforming an expression-based language into a statement-based language is difficult. It typically requires an SSA pass, and the result is rarely human-readable.

But I don't think this is really getting to the original issue, which is that there isn't any fundamental difference in power between a statement language and an expression language. A statement language is merely a syntactic overlay on the expression language.

I'm not sure what you are

I'm not sure what you are getting at. Whether you treat statements as `expressions that return unit' or as a different kind of thing altogether, you still have the problem that you cannot use a statement where a return value is required or desired. And this isn't just a restriction of the surface syntax, it is built in at the bytecode level. If you try to transfer control out of the bytecode that implements the catch block, and you don't leave the stack at the same level as it was when you entered the try, the bytecode verifier will reject the code. This means you have to introduce a temporary variable to hold any value you wish to transfer out of the catch.

Of course the compiler can hide this problem at the surface syntax layer.

I'm not saying there is any fundamental difference in power between statement and expression oriented languages. I'm saying that biggest difference between the two is that you can trivially model statements by ignoring the return value of an expression, but that it is awkward and painful to model expressions in a statement oriented language.

Of course this has been known for a long time, and the standard trick for implementors has been to just make everything return a value at the object code level and arrange for the instruction stream to discard the unwanted values. But less well known is that you can't use that trick on the JVM or CLR because the restriction on return values is built in even at that low level.

I'm asking Eli Gottlieb to please avoid that trap and just make everything an expression.

Bytecode is a distraction

The confusing part is the mention of bytecode. It doesn't seem relevant to the debate. The x86 instruction set is full of "statements" that don't modify the stack yet Haskell and Scheme live just fine on my box. Sure the JVM's bytecode for exceptions is statement based, but see my recent post about Scala. One language isn't necessarily hampered by the limitations of languages to which it is translated.

Or is your argument that the JVM makes it harder than necessary for language builders to implement expression oriented try/catch structures? If so, well...no argument there.

My argument is only a tad

My argument is only a tad stronger than you characterize. Languages that enforce a dichotomy between expressions and statements are simply more difficult to work with than those that allow you to mix the two. Most languages that have both expressions and statements allow you to use expressions in a statement context, but not the other way around. Languages that have only expressions have no such problems and are far less awkward to work with.

As far as bytecode, the issue is this: if you look at the addendum to Continuations from Generalized Stack Inspection, you'll see that step 2 of the process is to convert the code to A-normal form. The only reason we do this is to convert all the compound expressions into statements so we can wrap the try/catch around it. When I was discussing this with Felleisen, he suggested that we could bypass this transformation because it was just an artifact of the surface syntax of the Java/C# language. However, you still need to perform this transformation even at the bytecode level. Let me see if I can illustrate. If we have the expression foo(bar(x), baz(y)), it should compile to something like this:

push x
call bar push y
call baz
call foo

Suppose we'd like to put a try/catch around bar to catch some condition and substitute a default value. We'd like to do this:

push x
try {
call bar
goto join
}
catch {
pop ;; discard the exception
push default
goto join
}
join:
push y
call baz
call foo

The idea being that if bar succeeds, we simply jump to the code after the catch, but if bar fails, control is transferred to the catch clause which discards the exception, pushes the default and transfers to the join point. At the join point, we either have the result of bar or the default on the stack. This seems obvious and natural and is the way this sort of thing is done in Lisp.

This bytecode is illegal. It makes sense, it would work, it's just disallowed. Even at this level you have to do A-normal conversion and introduce a temporary:

push x
try {
call bar
pop temp
}
catch {
pop ;; discard the exception
push default
pop temp
goto join
}
join:
push temp
push y
call baz
call foo

And the reason you have to do this is simply because someone didn't understand the issues surrounding statements vs. expressions and made an ill-considered choice to make try/catch be a statement rather than an expression.

obligatory xref

Frank Atanassow, who has forsaken us, had this to say on the subject:

Before you go off inventing new programming languages, please ask yourself these questions:

  1. What problem does this language solve? How can I make it precise?
  2. How can I show it solves this problem? How can I make it precise?
  3. Is there another solution? Do other languages solve this problem? How? What are the advantages of my solution? of their solution? What are the disadvantages of my solution? of their solution?
  4. How can I show that my solution cannot be expressed in some other language? That is, what is the unique property of my language which is lacking in others which enables a solution?
  5. What parts of my language are essential to that unique property?