## Fast Compilers

I once worked on an IBM 360 coding COBOL programs, and we were lucky to get two compilations a day. We spend most of our time staring at listings and punched cards trying to find bugs, including syntax errors.

Since that time, compilation times have improved greatly, and computer speeds are stellar. Yet, I sometimes see a comment about compilers needing to be fast.

Is the concern over compiler speeds still warranted, or is it just a ghost from the past? I realize compilers are doing more now than previously. Are they really doing things that take thousands of times more time?

## Comment viewing options

Run a profiler on loading a page on Safari/Chrome/WebKit. More time is spent compiling JS than running it ;-)

### Less facetiously, if we look

Less facetiously, if we look at a lot of the new verification and optimization algorithms, they'll often assume something like an alias/points-to analysis. Such algorithms are context sensitive and generally are not run with more than 3-4 levels of context in *experiments* (whatever that means) -- and, often, that already can take hours.

On that note, one of my favorite papers at the coming POPL is Percy Liang's on using basic machine learning techniques for generating heap abstractions, picking context sensitivity, etc. for different pieces of code to balance performance with quality. Similarly, a few years ago, I predicted the same algorithms will be some of the best beneficiaries of parallelism and clouds (instead of just farming out make & testing) -- and now we're indeed seeing quite a spurt of papers about it.

### Verification and Optimization

A compiler to verify my code is worth some wait. Otherwise, my time will be consumed testing.

Optimization can be postponed for the last few compilation, test, fix cycles.

Please tell me you have a compiler that will fix all my bugs. If it works faster than I do, which is not all that fast, I can retire:))

### Verification and testing,

Verification and testing, today, are merging: see symbolic execution (in particular, Koushik Sen, Dawson Engler, and Patrice Godefroid's papers). It's not a one-or-the-other situation either; employees at Microsoft, among some of the more modern software writers, will use ALL of contracts, fuzzers, type checkers, more general static verifiers, crash reports, and probably other things I'm not thinking of. For many of the tools -- fuzzers (which are really mixed static/dynamic analyses) and full static analyses -- results depend on how much compute time you put into them**.

Optimization might be postponed.. unless you're in a dynamic language. When working with an optimizing compiler -- performance engineers -- compile time is still important: results aren't so much about one-off paper designs but benchmarking and experimenting (a mini scientific process of hypothesis testing and reformulation).

**Better understanding of diminishing returns and triage is a big, ugly problem that isn't discussed enough in academics.

### Verification, testing, and specialization.

Are all those tools built into the MS languages, or are some separate?

MS has a lot of SW to maintain, they could benefit from the best tools.

Last I heard verifiers and symbolic execution can only be partly automated, but I don't keep up with them. Does MS have experts to assist some tools?

### Some are -- contracts, type

Some are -- contracts, type checking. Others are basically integrated into the build are test systems (e.g., rewrite a binary to run cheap race detectors when doing standard unit or scenario testing).

The symbolic execution stuff isn't quite there yet, but, for the last 10 years or so, that's been one of the biggest practical results in verification/testing and has shown up all over (concurrency checkers, security checkers, GUI testing, etc. -- I've even used it for integrating search into applications!). I believe the driver verifiers have basic plug-and-play usage models and Pex is very close (I believe it automatically integrates into VS unit tests). I suspect the hurdle here is that fully automatic testing, except in some cases, isn't exactly the sweet spot for general programs -- some guidance will likely go a long way, but it's not clear what that should be.

### Everyone has an opinion.

You shouldn't have mentioned test automation, because like chocolate I cannot resist..

There should be a coverage indicator that tells what branches of a program have not been tested.

Test artifacts, such as scripts and data, must be stored with the project so the will not be lost, and made clearly available to everyone who works on a program. Testing should be integrated into the IDE, partly because test artifacts must be maintained with programs or they become obsolete. And, finally, the test tools should be very easy to use. A zero-button tester is idealistic, but captures my frustration at developing more test code than deliverable code--all lost when I left companies.

### Static code coverage is a

Static code coverage is a actually pretty weak metric for many analyses. Imagine a race detector: you can run all the code blocks many times and not hit the race. We're really interested in the state space which is generally infinite.

That's essentially why we can always use faster program analysis techniques: they're really search heurestics. Static analysis was historically attractive in that it could guarantee full coverage in a cheap way, but generally with high expense (abstractions that are conservative and thus have false positives and coarse so that we can't express general properties). Advances in SMT solvers, hardware, and higher-level algorithms now make the fine-grained search more tractable... but there's still a ways to go before we get Google-like response times.

### A Metric

The nature of the metric is not as important as measuring a programmer's progress towards achieving a testing goal. Programmers need to report that their program is ready for release. A test metric could remove some of the subjectivity,

### Measuring the success of a

Measuring the success of a dynamic analysis is hard. Consider the race detector: the goal might be something like nine nines of % uptime. A random fuzzer will randomly search the space, a playback tool (say of the last year's usage) will reflect old behavior, etc. Perhaps you mean the testing goal is inherently domain specific... either way, I've always found it to be a hard problem (and iffy when considering all those "we found 100 bugs!" papers: you really only care about the bugs worth triaging, and even if you found ones, you might have missed even more important ones).

### Edit Compile Test Repeat

Compilation speed is still a concern.

We want to avoid recompiling everything after a few changes. We want to link fast, without expensive searches. We want to parse and typecheck really fast, which is useful for supporting IDE integration (syntax highlighting, hover-text, intellisense, refactoring).

I'd even like zero-button testing, where I get continuous results of building and running tests as I type code.

When we consider runtime compilation, i.e. for scripts or code distribution or live programming or deferred code-specialization, it is even more a concern. We want fast and effective optimizers for these purposes.

Compiling is certainly a task that obeys Parkinson's Law. If compiling gets faster, we'll simply compile with greater frequency, for a greater variety of purposes, and find more work for the compilers and optimizers to perform.

### Zero-button testing--YES!

Let's add character by character evaluation to that zero button testing, since we are wishing. In which case, my two finger typing and backspace corrections wouldn't need a very fast compiler.

### Immediate feedback needs to be fast

What we ultimately want is immediate feedback, directly in the editor, on syntax (syntax errors *and* structure of the AST to see problematic disambiguations), types, etc. This needs to be fast.

### The compiler doesn't have to

The compiler doesn't have to be super fast, you could just make it smarter.

By memoizing the type and syntax trees, you can get great interactive performance along with the historical information needed to do decent error recovery. Worked for awhile in Scala, but the Scala team ultimately chose laziness instead (which doesn't provide the same results, but alas...).

### Smarter not faster

Did you say, only recompile changed code? .

### Only recompile the changed

Only recompile the changed trees. If their result requires recompiling other trees (say you modified a definition), then recompile those trees as well. You can't memoize every small tree, of course, but you can pick a nice granularity based on the syntax of your language (e.g., definitions and blocks).

### Immediate feedback needs faster than fsat.

To be that fast will require cloud computing on your desk or a super breakthrough in compiler technology. But, few people notice a millisecond delay, it's OK to be a little slower than immediate:).

### 50 milliseconds is the upper

50 milliseconds is the upper bound of what you can do while locking the user out of editing. 100 milliseconds is ideal to appear interactively responsive with keystrokes.

### Yet, I think that the

Yet, I think that the definition of "immediate" can be stretched a bit further. The notion of "immediate" I have in mind is "when I want the information, it's there". As Sean pointed out, it doesn't mean that it has to be computed instantly, but we could go further and argue that not all 50 milliseconds are equal: there are points in time where the user may want the information to be ready, and other where you can safely assume that he won't need, or even be able to absord some part or all the information you could compute.

For example, when a user is writing a program identifier, there is no point recomputing the type derivation of a program at each keystroke. The type information will only be useful to him once she's done writing the complete identifier. The identifier resolution information, however, might be relevant, as the user may assume she'll get direct feedback once he's completed the full identifier and rely on it while typing.

Sean systematically points out that a kind of memoizing/differentiation/incrementality support can reduce the work to be done by the various language algorithms, and hence alleviate the need for raw speed in the "cold" case.

In the settings where the compiler speed is a matter of user interface, I think further simplifications may come from finer user interface considerations, such as heuristics to know when the compiler feedback is really needed.

### For example, when a user is

For example, when a user is writing a program identifier, there is no point recomputing the type derivation of a program at each keystroke. The type information will only be useful to him once she's done writing the complete identifier. The identifier resolution information, however, might be relevant, as the user may assume she'll get direct feedback once he's completed the full identifier and rely on it while typing.

When you get the identifier correct, you will know as soon as you type the last keystroke. It is a very good feeling when you get used to it, it feels like the compiler is always with you. If its cheap to do (because you've memoized), why not? The old Scala IDE had this, and I thought it was a killer feature, but that whole project fell apart and now I think the new IDE does refactoring or something. The Live Programming version of SuperGlue also had this, but of course that was much more aggressive.

You have to experience it to appreciate it, and I don't think any modern IDE right now can achieve this. Most modern IDEs depend on having a very fast compiler that re-processes the buffer on demand or in the background. Its a poor approximation, but better than nothing at all. There are definitely cases where we don't need compiler feedback right away, but these should be once every 5 minute tasks. The task you perform once every 5 seconds should resolve very quickly, instantly even.

### Heuristics

A thread to analyze the user and one for the keyboard, nice. The next step for the heuristic to watch the web cam to see where the user's eyes land on the screen. Does that mean big brother is watching:)

### That gives us roughly a

That gives us roughly a billion cpu cycles.

### Please never lock me out, not even for 50ms!

By all means, please never "lock out" the user. Make that stuff run in the background, and run the GUI thread with highest priority. Programming is visceral. I'm quite sure that even a 50ms pause can be felt.

### You won't notice it at 50

You won't notice it at 50 ms, but the experiment is easy to perform. You will always be locked out for some amount of time, the UI thread is not multi-threaded, any modification to the UI will necessarily lock you out. The trick is to hide non-UI processing in another thread when necessary, but then you have sync overhead to consider. There is some amount of time where you want to hold the UI thread without disrupting the user, for best effect. Of course, if you go over that time, you want to unlock the UI thread and re-sync later.

### At 50 ms...

"Lock out" for 50 ms is sometimes quite noticable, especially when working with continuous rather than discrete user inputs.

For example, let's say I've mapped a few keys to panning, tilting, and zooming a camera, such that holding the button causes it to continue operation in the associated direction. A lock-out of 50 ms means there could be a time-error of up to 50ms in processing the button-down, and a time-error of up to 50ms in processing the button-up.

This becomes noticable as inconsistency in how the controls react - i.e. sometimes the camera moves a little, sometimes it moves a lot, for equivalent press on the keyboard.

The problem can be alleviated a bit by using some sort of acceleration factor to improve fine control for short taps and twitches, but it would be nice to have it be more consistent fine control all the time.

This seems to mostly be an issue for joystick-like input devices or emulators - I've not in my dayjob heard a complaint of this nature for discrete inputs. But it has motivated my interest in timing 'consistency'.

### Of course. 50 ms makes sense

Of course. 50 ms makes sense only for keystroke input, and if you type very fast it might be too long, you'll have to tweak it down to 40 ms.

For continuous input, you basically have to finish in less than a frame cycle (including UI overhead), which is about 16 milliseconds.

### UI without lockout

It is feasible to keep the display thread independent of the input thread, and independent of the primary processing threads. This is more the direction I've been pursuing recently - albeit, in Haskell, where threads are safe and cheap.

### Well, if you want to go that

Well, if you want to go that deep, rendering is already independent of the user UI in many modern UI libraries, like WPF. So if you've already issued instructions to the renderer, they will be retained and your screen will repaint even if your UI thread is locked up.

DirectX 11 even allows for multi-threaded rendering, but that doesn't apply to a IDE. The main problem would be shared resources, like the editor buffer, logically you can only update that in one thread.

### Yeah, a goal would be

Yeah, a goal would be wait-free concurrency for interacting with any shared resources in the entire application.

The limitations of human perception are quite useful here - i.e. one could continue using the old state for the editor buffer for up to 100ms after a keystroke and nobody would notice, though we probably don't need to wait that long.

Ideally, one can reason about responsiveness across the whole application, rather than just hoping our procedures take less than 50 milliseconds or whatever. Wait-free helps with this, but something more is needed.

### You can run the procedure in

You can run the procedure in another thread and hold the UI thread for 50 ms. If the procedure is not done by then, you go to background mode and process the procedure's results when it comes back. This means you potentially need to reconcile changes with a buffer that has changed, but support for that is already built into Eclipse (if somewhat difficult to use). If the common edit is 50 ms edit. Sort of like skipping frames in a game. If you do that often, the worst case is that you are just like the Java or C# IDEs today (which are already very asynchronous).

EDIT: when you go asynch, you are going to incur about another ~50 ms of overhead to resync with the UI thread. This means that it is useful to hold the UI thread if you can get something done quickly.

### Synch Costs

What synchronization method are you assuming? Because I really cannot imagine a 50 ms (well more than a LAN network round-trip!) overhead for synchronizing information with a UI thread.

Is this specific to Eclipse?

### Its not sunk cost per say,

Its not sunk cost per say, its the latency it takes to get something back into the UI thread. My timings are all pretty approximate, it definitely wasn't instantaneous, you could feel the lag if you ran asynchronously all the time even for operations that completed very quickly.

It might be specific to Eclipse, not sure.

### Sounds architectural, then.

Sounds architectural, then.

If the UI thread spends, say, 60% of its time typically waiting for up to 50ms for operations to complete, then we would expect to see a 60% chance of an 'average' 25 ms delay between making an input available to the UI thread and the UI thread (potentially) processing it. This would introduce something of a self-fulfilling prophecy: you don't make operations asynchronous because it will introduce a lot of delay, and it introduces a lot of delay because you didn't make other operations asynchronous. (The issue would only be exacerbated if you waited 50ms for every asynchronous operation!)

It's fine to do some things in the UI thread that can be completed 'very quickly', but to make the more asynchronous architecture work without noticable lag, you shouldn't be doing anything that takes more than a few milliseconds. The typical delay here becomes the delay it takes the UI to notice any other user inputs and asynchronous updates.

### You are possibly right. The

You are possibly right. The big problem is the event queue. When your async process is done, its results are queued up for processing back in the UI thread. Its not clear when the job is processed, but it doesn't seem to be very timely. It has to compete with the other jobs on Eclipse's worklist queue.

BTW, all the operations were asynchronous once one operation was. There is necessarily only one compiler thread, so any operation that occurs while an async operation is in flight is necessarily asynchronous itself. That is just compiler overhead, all the UI stuff still can only occur in the UI thread.

Very fun stuff. I wish I could go back and get better numbers, and publish a paper on how to make compilers really interactive. Maybe someday, but no time these days (my next language isn't textual).

### Must be eclipse

A latency of 50ms just to exchange data between threads is many orders of magnitude worse than it should be with high performance multi-threading (a quick test on my machine exchanged back and forth between two threads a million times in 6.44 seconds for ~3.22 microseconds per exchange). Of course, with actual UI work you wouldn't get that, but still 50ms is crazy.

### Movies 30 fps

Motion picture frame rates vary, but it looks like 30 fps is about the fastest. That means our eyes see smooth motion at about 33.3 ms per frame.Silent movies were taken at about 16 fps, which is 63.5 ms per frame, but they flicker.

### I believe most LCDs refresh

I believe most LCDs refresh at 60 fps. You could always drop frames of course, but eventually your continuous motion looks more choppy.

### Your statement is a huge

Your statement is a huge oversimplification. It's actually much more complicated; the notional "frame rate" of the eye depends on a lot of different factors.

### Interesting document. I

Interesting document.

I wonder if it'd be cheaper to simulate per-frame motion blur so that users perceive smoothness than to achieve framerates that would hide flickering.

Not that this really applies to IDEs and fast compilers. (Well, maybe it could apply ;-)

### Consistent Delay

I'm interested in exploring whether a consistent but longer average delay would be better for the user experience than is an inconsistent but shorter average delay. For example, would 150±3ms be a better design choice than 75±60ms, where variation is per operation? As latency increases, does relative value change?

This may be a useful question to answer for IDE interactivity, especially if 100ms is proving a bit short for the necessary computations. It's generally easier to add some artificial delay to achieve consistency than it is to eliminate delays.

### If the delay is noticeable

If the delay is noticeable to the user, it will affect their typing rhythm. If you need 100 ms, best to do that in another thread and resynch later, most of the time it will still appear as instantaneous to the user.

The Scala IDE used a background thread for compiler computations, however, it would hold the UI thread for 50 ms for the compiler thread to do its work, and in the common case the compiler could work in 50 ms. However, when the user is modifying the signature of a top-level class definition, there is a lot more work to do, and you would most often slip into background mode.

The live version Superglue did everything, including execution, in the UI thread. That was a mistake, but often the perf was still acceptable, its just some keystrokes would lock the IDE out for a few seconds, which was very unacceptable.

### fast compilers in context

I don't think that compiler speed is terribly important any more. The examples of why it is nominally important given in other comments here refer to quite marginal optimizations. You can stretch a hardware budget by a factor of 2 with a better JIT. You can stretch a grunt programmer budget by a factor of 1.4 with a faster edit/compile/debug cycle. Such things are a John-Henry,-steel-drivin'-man kind of competition between heroic hackers and Moore's law and we're late in that game. Compiler performance is scarcely a hill of beans, in importance, if even that.

### Hill of Beans

If we stretched grunt programmer budgets by 40% in America for one year, then spent all the savings on beans, I suspect we'd get more than one hill.

### bean mounds

Things that stretch grunt programmer budgets by 40% are quite common. Somebody makes a new web framework, a new hot DSL, a new platform, etc.... those do the trick. Moore's law does it, too, to some degree. That's a lot of what has shaped the industry over the past decade.

There's a business problem: it's hard to capture much of that value. (By "capture" I mean to reward the innovator. It's hard to trim $2 off every programmer budget in such a way that you, the clever innovator, get$0.10 of that $2 savings.) The hills of beans exist only in theory. Compiler speed improvements other than breakthroughs in the algorithmic complexity of this or that compete against Moore's law. Additionally, the business value of compiler speed improvements exhibits diminishing returns: once compilation is "pretty fast", making it twice as fast doesn't make it twice as valuable. Compilation speed as something to work on is thus thrice damned: hard to collect money for it; its easier to passively watch hardware become faster; and beyond some point, it's not worth much anyway. There are obvious niches that are exceptions that prove the rule. One can probably make a little money, for a little while, concentrating on JIT performance, for example -- but this is dinky sideline stuff compared to the change from half-day Cobol compilations. ### Moore's law has ended Moore's law does it, too, to some degree. Heat dissipation implies that compilers will not be faster in the future: you can't clock above a few gigaherz and throwing in more cores can only be exploited when your language and programs written are designed for parallel compilation, which mostly tends to be in the hands of the programmer. The 'good' thing is that we can expect to have more memory in the coming ten years or so, but that can only be exploited to little extend (compiling to memory instead of to disk). Moore's end is a real barrier on new language design. The languages developed in the eighties profited most from Moore's law; I recently read an historic remark on a critique on Clean as being too slow and too memory intensive: They claimed to have solved the problem since they could get the runtime to run fast in 2.5MB. Of course, no-one complains about that these days. But nowadays, you can just expect that your compiler will run as fast in the future as it does now. [ And it will need to compete with languages and compilers designed to compile in microseconds instead of minutes. ] I believe there's no way out of this except for studying and improving on eighties technology. ### Nothing New? I suspect there are one or two new language technologies that will obsolete improved 80's technology, for example AI. Assuming we are smart enough to replicate intelligence. ### Better Understanding I suspect there are one or two new language technologies that will obsolete improved 80's technology, for example AI. I don't mean that there's nothing to be found. What I mean is that I think the basics of most technology was developed then and earlier, but are, at the moment, still not very well understood. Take the little example of trying to write a functional language compiler. Type systems more elaborate than your basic Hindley-Milner are not understood very well, in the sense that we don't know how to efficiently type programs with them, we might even not know how to exploit them (I am thinking about impredicative systems here), whether they are anything like the final answer (I still hope that that there is a simple system which just has one quantifier instead of two for impredicative types), we don't even know that they are anywhere near correct (despite having them checked in COQ, it may still be the case that one missed a triviality when it comes to the implementation). I read the latest implementation for impredicative typing implemented in GHC (QML, see the other post), and the only feeling I got out of it (I am too dumb to totally understand it) was this: "Surely this cannot be the final answer." Or take compiling to an abstract machine for a functional language: There just isn't any good answer. There are a pletora of manners of doing it (Lispy intermediate languages, ZAM, combinators, graph rewrite systems, compiling to assembly directly), but there doesn't seem to be any rationale why one should be preferred over the other or what should be the best intermediates to try and do so. Just a collection of experiments, where one turns out better than the other one. Assuming we are smart enough to replicate intelligence. Yeah well, despite AI not having delivered yet what people thought it would bring, there is no reason to believe that intelligence is exclusive to animals, and that -after implementing it in silicon- we won't be humbled by our own lack of it. So, yeah, I hope they'll go ahead and do that. (Though I don't particularly look forward to having a conversation with a moody laptop because I just don't get what it wants.) ### Moore's mechanism vs. law Heat dissipation implies that compilers will not be faster in the future At the very least, very fast compilers will be exponentially cheaper relative to time for a while to come, especially in the common case where compilation can be parallelized. Two examples: 1) The linux kernel takes a bit of noteworthy time to compile on my little desktop machine. It's a heck of a lot faster to compile if I can use a LAN-attached cluster to compile lots of stuff in parallel. It is not so far off that my desktop can be as or more powerful than that LAN cluster, but for the same cost as the machine I currently have. We're hitting process limitations, as I understand it. We're running out of ways to make equivalent dies smaller. We still have many more generations of Moore-ish improvement by making dies larger and growing them in three dimensions. 2) Someone mentioned the desire for more and more real-time feedback in IDEs. This, as well, is a naturally very highly parallelizable version of "compilation" (even though it isn't necessarily quite compilation per se). Moore's end is a real barrier on new language design. It's funny you say that because my view is that many language theory / design / implementation types don't take Moore's not-dead-yet status into account nearly enough. But, we wander off here into the purely speculative.... I believe there's no way out of this except for studying and improving on eighties technology. Gosh, you mean you don't believe ever-more-complicated generations of CPU and software that, as the general rule, is by design too complicated for any individual or group of people to actually understand as a practical matter? Why, you heretic :-) ### Heretic! Gosh, you mean you don't believe ever-more-complicated generations of CPU and software that, as the general rule, is by design too complicated for any individual or group of people to actually understand as a practical matter? Why, you heretic :-) I want all my students to be clear, practical, and critical. Of course, that means being a heretic and I am a heretic myself. ;) But, seriously. I really think the end of Moore's law means the game has changed. In the eighties, you could rely on the fact that microprocessors would catch up with efficiency and resource consumption issues. This is no longer the case. Instead, new languages will need to compete with established languages which have had a lot of time optimizing away problems, and also had the benefit of having been developed on less powerful machines, so that they now run efficiently. I think, by extension, the new languages which will emerge will be the ones which are better designed at exploiting the efficiency limitations of the machine (like, for instance, C -with its long history- is), just by being a bit 'smarter' than what was common wisdom before. [ Which would mean Go should become a winner in my crystal bowl. ] Maybe language design will collapse under these new requirements and at some time people will start wondering why the hell other languages found it smart not to have forward declarations like C, and implemented very computing intensive type systems instead of doing it dumb but fast. [ Another example may be having interface files like C or ML. I did away with them like Java as 'unnecessary cruft which bothers programmers since they need to copy what they meant.' But they are great for fast parallel compilation since you can compile every module with respect to the interface file, instead of with respect to a -maybe yet unchecked- source code file. C, of course, has interface files partly due to the fact that compilation on old microprocessors was slow, and therefor, better implemented with an incremental process. Maybe, we'll see a turn-back to these old wisdoms, but just because programmers will be more and more 'spoiled' by the fastest tools at their disposal. ] ### Prediction I believe that flat file program store is inefficient, and source code should be served from a code base that is indexed as necessary to manage and manipulate code efficiently, including versions and many other attributes. . ### Moore's end is a real Moore's end is a real barrier on new language design. I believe the new directions and potential 'end' for Moore's law has been and will continue to be a huge opportunity for language, OS, and hardware designs. In some ways, Moore's law has hindered interest in alternative technologies. Why develop a language that ensures responsiveness, when these procedures will be fast enough anyway? Why develop a language that effectively leverages parallelism, when my program will double in speed without any extra work on my part? Why design a language for energy efficiency, when we'll soon have small dies that do just as much for a quarter the cost? Why develop a language that leverages cloud compute resources, when my desktop has become a supercomputer of its own? That said, I don't believe Moore's law will end any time soon. We have plenty long to advance in the 'more cores' direction, and in the 'specialized cores' direction - i.e. for putting code on the sound-cards, on the network cards, on the memory bus, on the memory. We'll need well designed languages to effectively leverage these. Even for the single-core, I suspect we'll see many more breakthroughs that shatter previous limits. Switching to diamond semiconductors instead of silicon is a very promising direction for heat dissipation problems (diamond can get very hot without damage, and conducts heat very well). I wouldn't be surprised to see great advances in light-based compute technologies. Parkinson's law tells us we'll find ways to use every bit of this extra performance and still find it wanting. I understand that holographic video could easily run a terabyte per hour... Anyhow, I see no need for pessimism about Moore's law. What I do see a need for is language designs that help solve some of the more painful problems - such as leveraging GPGPUs and FPGAs, ensuring responsiveness and progress, supporting embedded systems, managing energy consumption, coordinating remote systems and services, fusing data into rich perception, handling the flash-crowds problem, developing scalable and persistent and sharable and secure user interfaces, resilience and graceful degradation in the face of disruption and node failure and partitioning, and scaling to a large number of developers (so we can have wiki-sized programming projects). However, I don't really see how Moore's law helps solve many of those issues. Most invocations of Moore's law that I've ever heard are excuses to procrastinate architectural redesign. ### Nice Synopsis Thanks, Money:) I'm just having fun. ### Is the concern over compiler Is the concern over compiler speeds still warranted, or is it just a ghost from the past? I realize compilers are doing more now than previously. Are they really doing things that take thousands of times more time? Basically, yeah. Many compiler optimization problems can be phrased as computing the transitive closure of a graph -- and that has cubic time worst case. So by going from analyzing one basic block at a time to analyzing a whole procedure (ie, go from analyzing 2-3 lines of code to 20-30 lines of code), that factor of 10 increase can take a 1000 times as long. In fact, the ratios are worse than that, because basic blocks are straight-line code, and there are often clever optimizations based on that fact. Furthermore, these days people want to write very short procedures, and want compilers to do interprocedural analysis to make that efficient. So not only does the problem get bigger, it also gets harder. Our main consolation is that writing fancy compilers is fun. :) ### Realtime compilation We want realtime compilation for our IDEs. This isn't so much about counting passes but memoization and localization. Programmers want the gamer experience. ### forget realtime compilation Interpreted languages and realtime user-oriented (not compilation-oriented) analysis are what you want for an IDE. One strategy for achieving that is to make a "super fast" compiler but there are other strategies that are arguably more attractive. ### O.K. code generation / O.K. code generation / optimization might not be of importance and can be delayed until the "run" button is pushed. This is not true for all other source code analysis phases usually subsumed under "compiling" as well. ### Error recovery Some error explanation algorithms try to find small changes to the program that will make it "more correct": "maybe you forgot a parenthesis here", etc. This means lots of trial-and-error in a quite large search space, and require efficient operations (or approximations). It can take place at the syntax level, but also for type-checking. ### Or you could just memoize Or you could just memoize and when something breaks, limit your search to what just changed. ### Better error messages are Better error messages are worth waiting for. ### You can often have your cake You can often have your cake and eat it to. Why compromise? ### mistake mistake ### Distractions Where I currently work we have a huge c++ project and even though we use a distributed compilation tool, it still seems unbearably slow sometimes. The problem with slow compilation is not so much the down-time but rather the distraction of it. As soon you hit the compile button your mind starts to wonder and then it takes you another minute or two to get your head back in the game. I've found myself longing to use the Go compiler quite a few times. ### What was I doing? Yep, people can loose their train of thought in just a few seconds. It is much better to have 5 second (or less) compilations. ### No time for a hill of beans. wrong reply button !&@!!! Compilers on my machine are so fast there is not time to put one bean on the pile before they finish compilation, but once there was time to make a small hill--back in the "good old days." I love IDEs and modern compiler speeds. Beats the hell out of paper tape and punched cards. ### 1000 x longer That makes sense. Thanks. Once coding was fun, but it grew familiar as our kids and grand-kids grew up. ### Almost all compilers fail on performance Programmers demand and expect efficient tools, and there is little reason to, for instance, give them more expressive power if that costs efficiency. Nowadays, a programmer expects his tools to compile small programs in seconds. At the same time, when writing a compiler, it doesn't work to invest a lot of time in the efficiency of it. Most compilers start out slow, the best you can do, I think, is to design your language and algorithms to run linearly in the size of the code. [ And let your user-base optimize the compiler, once you get to that point. ] Most compilers never make it into mainstream since they are not efficient. This is especially true for compilers which spend a lot of time on static checks, or even code generation. It is a hard problem, as far as I can see, most compilers take ten years to mature. A few examples. It took Caml ten years to gain a factor 500 in speed compiling to native code instead of to the ZINC machine. It took GHC at least ten years to win a factor, say 20, on the Hugs interpreter, and then another ten years to gain a factor 1000 by compiling to native code instead of to the abstract G-machine. It seems Clean was one of the few to hit a sweet spot, since I think it has been running on efficient ABC code for the last twenty years. I have a hard time understanding why ABC code is efficient; there is no design document. (All factors are guesstimated.) Then there are lots of 'failures.' Last I checked, MLTON takes about a quarter of an hour to self-compile, which kind of defeats its purpose as a tool. [ And they put about a decade of research and a dozen PhDs into that. ] The Helium Haskell compiler is, after ten years, still very slow. Most Lisp compilers died partially due to their inefficiency. Actually, most interpreted languages seem to bypass this problem by being written in C and having reasonable speed interpreting code; most interpreter speed is 'by definition linear in the code.' But it then often becomes a hard problem to write native code compilers for them given that often design decisions then tend to abuse the specifics of an interpretative run-time. It is a hard problem. I actually regret a bit bootstrapping my language instead of having written an interpreter in C. ### Success Is Difficult Efficient code is though to write, especially for a program as complex as a compiler. My experience is outdated, being mostly a mainframe programmer, and haven't been paid to write or maintain any objects. Currently coding with gcc, and it is very fast, as are VB and Javascript. I've felt most productive writing some scripting languages (e.g., TACL on HP NonStop(tm) OS on Tandems), because they were interpreted and lazy typed. The downside is not being able to compile that code into C or machine language. True, some features cannot be compiled, but that's probably 2% of total code; the other 98% could be compiled to C, but there was no translator to do it. I think TCL handles compilation well, the interpreter runs programs, but calls compiled code when possible and desirable. TCL lets you have your cake and eat it too, if I understand correctly. However, I enjoyed most assembler language programming. Hehe, we made a mess before structured programming. Unfortunately, it takes a bundle of code to make a simple program in assembler, but when you make a Fortran II compiler process 5000 lines per min on a 2.5 microsecond cycle-time machine, you are a happy camper. Our production compiler processed about one card per sec. Unfortunately, assembler is not practical compared to C. ### It's more about generated code Efficient code is though to write, especially for a program as complex as a compiler. It's more that efficient code generation is difficult to come up with; or to come up with an efficient scheme. Most people invariably get it wrong, and a lot of compilers are successful out of plain luck, or hard work (or copying a proven design, forgot that one; but there's little fun in that). The Caml compiler is an example of luck. Leroy build an abstract machine which was as fast as a native ML compiler at that point, and subsequently got the time to improve on that. The GHC compiler is an example of hard work. Peyton-Jones once remarked that their early compiler was 'naive' in the sense that they had a parallel machine but it made little sense to exploit that parallelism since the performance of G-machine interpreted programs was about a factor 1000 slower than the equivalent in Pascal or C. Actually, to get Haskell programs to perform, you still need about all tricks in the book (both in the programs as in the compiler), and it still is often slow in comparison to Java or C programs. But, nowadays, that counts for less given the added expressiveness. Javascript is an example of copying a design. I gather that a compiler for that is mostly using all the lessons learned with Lisp. [ I guess the same holds true for Java. In some sense, it exploited two things: That C-like imperative code can be compiled efficiently (experience from Pascal/C/Algol), and that virtual machines can be implemented efficiently (from Lisp). ] ### Great Info Thanks ### The Caml compiler is an The Caml compiler is an example of luck. Leroy build an abstract machine which was as fast as a native ML compiler at that point, and subsequently got the time to improve on that. I'm not sure I'd call the Zinc AM design "lucky". He published a detailed manual describing every design decision he made in the ZAM, why it differed from the convention wisdom (or agreed with it), and how it improved the performance without losing much expressiveness. ### No doubt Leroy is intelligent, but so are others His design improved on a number of abstract machines and pain-problems with them. IIRC, it is mostly about handling closures. Still, he was lucky in the sense that his scheme worked out so well, and I think you can read that between the lines in his comments on the ZAM, comparisons to other machines, and later remarks on his road from abstract machines to concrete machine code, that he just managed to hit a sweet spot at that moment in time. That's luck. He also didn't get it right in one turn, and whereas he did, there are myriads of abstract machines in that point in time which just failed. (MLTON comes to mind.) In the end, I think the problem -generating fast code for functional languages- is still not very well understood, and partly obfuscuated by the fact that microprocessors are optimized to handle C translated machine code. ### His design improved on a His design improved on a number of abstract machines and pain-problems with them. IIRC, it is mostly about handling closures. He made numerous improvements to the runtime. Eliminating excessive closure constructions with a new type of function application was a big one in cutting down allocation, but so were tupled type constructors (they were generally curried like functions at the time), and a simple memory-conservative type representation which folded the sum tag in with the runtime type tag. This large reduction in overall allocations was the biggest win, and this was an explicit goal he set out to demonstrate. I'm not sure these algorithmic optimizations qualify as luck in hitting a sweet spot. His optimizations would be just as important today as they were then. Some might consider it lucky that these optimizations occurred to him, but I don't see how that's any more lucky than Ritchie and Kerninghan inventing C. There were plenty of other intelligent people then too, but they didn't invent C. Regarding generating fast code, I think OCaml has demonstrated adequately that a simple compilation model, like the ZAM, can be efficient enough if your language abstractions are close enough to the metal and you generate good machine code. If you start getting higher-level, like Haskell, then efficient compilation starts getting murkier. ### I would call that luck too but I don't see how that's any more lucky than Ritchie and Kerninghan inventing C. There were plenty of other intelligent people then too, but they didn't invent C. Why C and not Pascal/Algol/Fortran/Modula/Ada/...? Yeah, I believe Ritchie and Kerningham had the wisdom and the luck to have hitten a sweet spot. From a language perspective, and this has been stated over and over again, C is a pretty nasty horrible language. It has a pretty obscure syntax, the semantics is not entirely understood, and it lacks lots of safety nets found in other languages. But they hit a sweet spot. C matches the generated machine code very well (which must have been wisdom too, I imagine them to have been very good assembly programmers), manual garbage collection turned out to solve 99% of most programmers problems, their manner of handling forward declarations was also just good enough, and the fact that you can bit twiddle every value has been exploited for efficiency on so many different accounts that it just ain't funny anymore. The macro system also was sufficient, and it compiles fast. Moreover, it became the de-facto standard to write any OS in, which is lucky for them, and therefor a lot of applications were also written in C. (Which means they also got the bonus for free that microprocessor developers optimized machine instructions 'specifically' for C compiled programs; but conversely, most microprocessor optimizations -like virtual memory- could be taken advantage of, or manipulated, in C programs too.) Most decisions they made made it an excellent system programming language long after, and for different rationale, than they originally intended. I don't think they imagined that C would be used to compile hundreds of millions of lines of code systems. Similar for Leroy. Of course he improved on other machines, but so did a lot of equally intelligent people. I doubt he could have predicted the then excellent running time of the ZAM, especially compared to the then popular native code compilers. If other ML compilers would have compiled to more efficient machine code at that point in time, he would just have had a slow compiler. Both factors are luck at the right time. (I think Leroy admitted such also one time in the sense that it took him twenty years to understand why ZAM is an eager variant of the Krivine machine. Of course he made rational decisions, but he lucked out on the fact that it worked that well where others didn't. He, like the most of us, just also has a hard time understanding why certain machines work, or why certain optimizations work, and others don't.) Moreover, subsequent optimizing Caml compilers compile to native code, not to ZAM code anymore; and in doing so, Caml programs have grown factors more efficient. Also, I wouldn't call Haskell more abstract than Ocaml. the biggest difference is eager vs lazy, but both type systems are pretty complex and both have roughly the same problems, getting under/over-saturated function calls to perform fast, handling the problems of lexically scoped values and garbage collection, and mapping computations to a 'von Neumann' register machine optimized to handle C's calling convention and other idiosyncrasies. [ By the way, I agree on your comment of ZAM efficiency. But my personal aim was to see how one can compile to C trivially without an abstract machine. Though I might have failed at that, not sure actually. ;) ] ### But they hit a sweet spot. I But they hit a sweet spot. I agree, but that doesn't mean they hit it accidentally. I rather think it was mostly deliberate. Ditto for Leroy's ZAM. As for "Why C and not Pascal/Algol/Fortran/Modula/Ada/...?", this has been discussed to death elsewhere. Because of free or readily available compilers, UNIX, performance, control over low-level representations, portability, and so on and so on. Many reasons why C dominated during this time. I think Leroy admitted such also one time in the sense that it took him twenty years to understand why ZAM is an eager variant of the Krivine machine. He explains this in the ZAM report, though it didn't take nearly 20 years IIRC (it's in Chapter 3 somewhere). Of course he made rational decisions, but he lucked out on the fact that it worked that well where others didn't. I don't think others were optimizing for the same factors he was, which is why they didn't "luck out". In other words, it had little to do with luck but had everything to do with having the focus on performance on "microcomputers". All existing MLs required lots of CPU and memory on big workstations. This was actually the first of three motivating factors for a new ML implementation that he describes in Chapter 1 of the report. Also, I wouldn't call Haskell more abstract than Ocaml. the biggest difference is eager vs lazy Laziness, type classes + extensions, and kinds definitely make Haskell more abstract than OCaml. OCaml has caught up with its recent extensions yielding convenient GADT support, but Haskell is still ahead IMO. ### They borrow from each other... Recently, Peyton-Jones implemented this in GHC: http://research.microsoft.com/en-us/um/people/crusso/qml/. Developed in Caml Light for ML systems, checked by COQ, and yet another attempt at getting impredicative typing right. Not sure that this proposal managed to get it right either in ML or Haskell. I don't think either is ahead of the other in anything. Even, or maybe: especially, at the fundamental level, there's lots of stuff which isn't well understood. I see more experiments in Haskell, that's true. [ The comment of Leroy on understanding Krivine wrt ZAM is in some powerpoint on implementing OCAML, I lost the reference. ] [ With regard to efficiency and luck. Yeah well, I think it is luck, even if you can expect something to be faster, you don't know the factors until after you tried and a lot of optimizations fail to deliver the intended speed-up. Or a lot of people think they do something smart when they are doing it. The G-machine was once considered efficient and was designed that way, I doubt Peyton-Jones still thinks about it as an efficient machine. ] ### Where's el-vadimo when we need him? The comment of Leroy on understanding Krivine wrt ZAM is in some powerpoint on implementing OCAML, I lost the reference. 16 years, not 20. And for what it's worth, I'm inclined to agree with Sandro and give more credit to cleverness than to luck. Leroy's humility is charming, but it may be naive to take it at face value... That said, the specific design change he's been talking about in the pages prior to that comment has been re-invented many times, I'm sure, probably by everybody who ever tried to make a fast interpreter for a fully-curried CBV lambda calculus. ### That said, the specific That said, the specific design change he's been talking about in the pages prior to that comment has been re-invented many times, I'm sure, probably by everybody who ever tried to make a fast interpreter for a fully-curried CBV lambda calculus. That's what I figured. Concurrent (re)invention is very common when faced with similar challenges. Different pressures or goals will lead to different results, which is what I chalk up previous "failed" languages by intelligent people (which probably achieved the goals they were meant to prove, but no more). I think Leroy's luck was being in a position to offer a platform that a larger body of programmers happened to be interested in, rather than any sort of luck at actually figuring out how to implement it. ### that counts for less given the added expressiveness I think that is a major argument against worrying about faster compilers. Programmers are the ultimate instant gratification freaks but we need to be weaned off this requirement. With an expressive language much time is spent *thinking* about design issues, not coding, so compile times just aren't so important. ### This sounds like the This sounds like the opposite direction interaction-wise of what we want: instead of me sitting and thinking, and then dictating to the machine, I want the machine to augment my thinking abilities. I want it to take my code or architecture and see where it goes (or doesn't). I want to sketch something out and get feedback on it. Interestingly, adding expressiveness (e.g., underspecified programs) seems pragmatic for such a vision. ### There will always be two There will always be two extreme schools of though: one who thinks programming is primarily an activity that requires lots of thinking and design up front, where you write your code perfectly the first time because you thought a lot about it before had; and two where programming is mostly performing art, where you design and think on the fly while doing. Both are valid schools of thought. But you might think of applying the former school by prototyping a lot of options in the latter school. ### Two different goals I'm not sure it only depends on the person doing, even if personal preference and experience are important. I would rather say it depends on the thing to be done and point in the creation process. For large, complex things that will need to evolve easily, you need a bit of design -- or its very costly. There are other things where spontaneous tinkering work best, for example when you need to know what can be done to refine your requirements. I'm not sure it's fair to call the second form "art". I don't know much about artists and won't try to pretend I do, but I've noticed in museums that for some worksl, e.g. some of the Da Vinci paintings, there are often a quantity of sketches and plans available, that looks like up-front design and prototyping of independent parts to me. ### It might also depend on the It might also depend on the person doing. Sherry Turkle's work on bricolage comes to mind, where she finds that women (vs. men) perform much better when programming is interactive rather than done with lots of up front thinking. ### Bridging extreme domains I've always been fascinated by programming as art, especially in such forms as digital DJs, on-the-fly composition of video, writing up quick plans to coordinate robotic systems, combine sensor data. Ad hoc queries on databases are fine places where iterative programming with rapid feedback are excellent. Writing up quick business reports, heat maps, 'live' dashboards are excellent domains for prototyping. On the other extreme there are life-critical or emergency systems (medicine, aerospace, vehicle brakes, traffic control, weapons control, telephony), expensive simulations (15 hour turnaround), framework or API design (which can paint you into a corner with backwards compatibility issues), and systems needing a lot of optimization (such as database, video-game, or multi-media engines). These domains don't exist in a vacuum, of course. We want dashboards to observe and control mission-critical systems. We want our digital DJ to control a down-to-the-metal multi-media systems, and perhaps even have the composition compiled to that level. IMO, new (general purpose) programming languages should be designed with the goal of bridging the extremes. The polyglot separation of 'scripting glue' languages and 'systems programming' languages is a problem we should be attempting to solve. I don't believe combining the different schools of thought requires sacrificing expressiveness. But it may require avoiding language designs involving whole-program analysis after local edits or many whole-program edits. For that reason, I have doubts about the viability of dependent typing, sub-structural typing (linearity, region inference), Hoare logic. As an alternative, I currently explore simple typing plus architectural approaches to expressing and controlling invariants: oracles, object capability patterns, sealer/unsealer pairs, 'layering' within the language (e.g. procedures can call functions but not vice versa) or simple effect typing. Unfortunately, these alternatives don't expose user-level invariants to the optimizer. The language may provide invariants such as purity in certain layers that the optimizer can leverage, but that would only capture a subset of domain invariants. For getting that down-to-the-metal performance, even for the rapid prototypes, I'm fond of the possibility for runtime specialization (via under-the-hood staged compilation and/or JIT) where a closure can be specialized to its arguments. Using a little reactive invalidation, we can also extend specializations to current state of a program, i.e. so slow-changing pieces can be compiled down. In context of embedded DSL and explicit specialization, we'd have:  interpret :: MyDSL -> Args -> IO () myDSLCode :: MyDSL myProg = specialize$ interpret myDSLCode -- special tag 'specialize' that operates on closures specialize :: (a -> b) -> (a -> b) 

Ideally, the language would reduce the constant myProg, effectively compiling it into a regular function. Explicit specialization isn't always ideal, but it lets us control space costs. (It might also be useful to have separate tags for static vs. dynamic specialization. A program without any dynamic specialization tags can omit the compiler from the runtime's footprint. A program that would contain static specialization tag at runtime results in a compilation error.)

With regards to the OP: fast compilers are important for all of this, both rapid feedback and runtime specialization. But the design of the language will have a significant effect on amount of recompilation and whole-program analysis, and thus upon effective speed of compilers. A language designed from the start to effectively leverage staging and runtime specialization (especially of temporary state) will probably look different than a language not so designed.

To the extent we build and popularize tools that bridge the extreme domains and support developers of both classes, we may find that there are no longer two schools of thought. I suspect the dichotomy exists in part due to the polyglot approach today where we use different languages and artificial separation for different parts of deeply connected problems.

### (Even complex) types systems are compositional

But it may require avoiding language designs involving whole-program analysis after local edits or many whole-program edits. For that reason, I have doubts about the viability of dependent typing, sub-structural typing (linearity, region inference), Hoare logic.

I'm not sure I follow you. Even dependent/linear type systems are compositional, so that for example modifying the type of an identifier will only require re-checking the subparts of the program that are typed with that identifier in their environment -- of course, if you don't control the scope of the declarations, this can be larger than you would like.
Inference systems may have a non-local behavior, but you don't necessarily need them, or (if you want them) can restrict their scopes.

### Manifest typing is axed if

Manifest typing is axed if you plan to support the scripting, glue, rapid prototyping, or live programming domains. Even the simplest type systems you can imagine would depend heavily upon inference and the occasional annotation. A rich type systems will rely more heavily upon inference than a simple one - those rich types tend to be complex and involved, not something people will often want to write down.

So I am assuming inference is necessary if you want both static typing and to support those domains.

Restricting scope of inference is a possibility. There may be a balance you can strike between annotation and inference that hackers would easily be able to use. However, you'll need to assume that the annotations written by programmers will mostly be simple ones. It should help to support partial annotations, such as DDC does, allowing most of the type to be inferred. But if the scope of inference is large, whole-program analysis will be common. If small, you won't be getting much advantage from the type-system in the first place - not without a lot of whole-program edits to tweak the type annotations.

Consider, as an example issue for scoping inference, the exception tunneling pattern in Java. Presumably, this could be avoided by inferring exception types. However, to make it work would require inferring exception types even across the visitor-pattern interface.

I think a lot of examples for type inference, if you support it at all, will end up needing to cross interfaces and frameworks. I've seen claims of using dependent typing for handshake protocols and message serialization between reader-writer monads. The type complexity of data plumbing involved in such a claim is tremendous (e.g. lists where every element is a different type mutually depending on the type of elements in another list). Making everything work across those data plumbing layers is difficult, and not very compatible with tightly scoped inference.

I don't think rich types will allow us to bridge the extreme domains or schools of thought - not unless we carefully keep those rich types optional and pluggable. But optional, pluggable types may not have any semantic effect: no typeful programming, no type-based dispatch. I do approve of optional, pluggable types... but I don't really consider them part of the language. They fall more into the class of extra-lingual static analysis tools like lint - not really part of the language, but still useful to integrate with the IDE.

(I have further objections to nominative typing based upon issues of modularity and reuse (see ban on imports) and my own interest in distributed programming. However, nominative vs. structural typing is not directly germane here except insofar as annotating structural types requires some extra consideration. I think the above argument is sufficient.)

There are object capability patterns to enforce linearity or regions, if we need those properties in a robust, non-local manner without whole-system analysis. There are patterns for handshake protocols that work with simple types, too. We don't need rich typing to enforce most invariants, and I think we'd do well to avoid depending upon them.

### Bootstrap on the right target

I don't regret bootstrapping Virgil at all. I wish I'd done it sooner. I wrote a decent interpreter in Java that interpreted a tree-based IR. That was about a factor of 5 slower than Java with -Xint. Still, it was fast enough for me to bootstrap a new compiler that generates Java bytecode directly. Now it can recompile itself (20,000 LOC) in less than two seconds. When I was bootstrapping using the interpreter, it was on the order of 40 seconds. Productivity has increased a lot (but debugging is harder--I still don't have full source information when compiling to Java bytecodes, but so far Virgil is extremely deterministic, so it's no problem to debug on the interpreter).

Actually, the new compiler has an interpreter too and it's more convenient to skip the compile step and just run programs in the interpreter.

### Bootstrap from Interpreter

I plan to follow the same path you did: write an interpreter (I chose Haskell); write a compiler, interpreter, and runtime in my own language; then bootstrap using the interpreter. I think that's a very sane route to take.

Marco, with his Google-search-unfriendly language 'hi', took a real programmer's approach.

Jumping straight to the compiler seems painful and masochistic to me. Maybe it's because I'd need to write a lot of runtime code, since my language uses a new paradigm pretty far removed from its host. Or maybe I'm just a sissy.

### pyramid for evidence

dmbarbour : write a compiler, interpreter, and runtime in my own language; then bootstrap using the interpreter. I think that's a very sane route to take.

Good idea, because in the context of a tool's ecology, you need a test for everything to provide evidence that each part of a tool's stack does what is expected. Specifically, every tool needs a test. Since tests are tools, they need tests too. (A test only demonstrates what it claims if it actually does what you think, for which you need evidence.) You get a pyramid from this recursion, grounded in simple tests with self evident behavior. But you have to actually look at what happens, so you know.

When you try to write software at a different point in phase space, too far from what is normally done, few existing tools are suitable for this pyramid of support. For example, async code typically suffers from lack of sufficient tools in local testing ecology. So you need your own, if you plan to diagnosis what goes wrong when it does.

Thus more parts and stages in your development plan give more places you can examine for evidence about behavior. You want to be able to reason about a compiler's behavior based on interpreter behavior, for example. You might test by comparing results of interpreted and compiled code for equivalence, modulo time behavior, etc.

About ten years ago I thought it would be cool to permit any function to be either interpreted or compiled, so cactus stacks of both can be interleaved arbitrarily in continuations. This makes lazy compilation easier to study, and makes debugging more facile. My mental model was that code can be "typed" by what engine executes it, so calling a continuation invokes the right engine. Among other things, this lets you compile code in different ways, say with different VMs, that can inter-call each other without problem. But it requires a common, mutually compatible runtime.

Lately I just worry about the runtime: how you'd write code manually in a runtime without your own language, either interpreted or compiled. Then an interpreter and/or compiler would be optional extras you can use as part of a testing discipline. (That is, when testing you want to interpret test scripts, but production code does not require that language.)

### Re: runtime first

how you'd write code manually in a runtime

Your statement connotes writing code against a well-defined runtime API rather than a specific implementation, and I totally agree. What I've been doing since about July is developing and documenting an API (in terms of type classes and type families), and providing a preliminary implementation atop the IO monad. I fully intend for this runtime to have quality and performance enough for production use, independently of the language. That is, my RDP paradigm (specifically, its temporal variation) will have a reference implementation, suitable for exploring or extending the paradigm, even if my language bombs.

My interpreter will target this runtime API. I'd like a typed tagless interpretation, based on Oleg's work. A high quality interpreter also serves effectively as a benchmark for success of the compiler...

But I expect to have a whole new 'runtime' for the compilation target. Why? Because certain issues will be much more important in the compiled form (such as providing my own allocator, threading, garbage collection, networking, FFI integration, annotating profiles and optimizations), and certain other issues will be much less important (such as conveniently integrating Haskell types and functions), and the change in priorities suggests a change in the runtime API.

an interpreter and/or compiler would be optional extras you can use as part of a testing discipline

A runtime might assume that protocols, disciplines, and properties are enforced by static analysis at the language level. To whatever extent that is the case, I don't believe that the interpreter and/or compiler are 'optional extras'.

### Compiler-only bootstrap

I've developed two (toy) Lisp compilers, one targetting JavaScript, and the other targetting C, both without an interpreter. (Compiling to high-level languages makes this much easier, and maybe even possible.)

I find that keeping interpreters entirely out of the picture has many benefits: the logical separation between compilation and runtime can't be blurred (neither by design, nor by accident); there's no possibility for subtle semantic differences between compiler and interpreter; generally a lot of insight into interesting issues, such as reflective towers.

(If anyone's interested I'll try to describe the bootstrap process I've developed.)

### Well yeah

Well obviously if you already have a working Lisp implementation then you don't need to write another one to get started with your compiler implemented in Lisp.

### Sure

What I did though was to write the compiler in the target language (JS, C), keeping it so simple that it's not too painful (only small set of core forms like IF, SET!, DEFINE, DEFINE-SYNTAX, ..., and primitives for manipulating syntax objects), and arranging things so that the language can be grown from that set of forms using (procedural) macros - which are already compiled by the compiler.

(Why use the target language for writing the compiler? Gives me a warm fuzzy feeling to not require an additional tool.)

### Why use the target language

Why use the target language for writing the compiler?

Is this really a bootstrap, then? My understanding of the term is that a language is 'bootstrapped' after its compiler compiles itself. This requires that the compiler definition be written in the compiler's input language, or a subset thereof. The trick is making this happen the first time, which usually involves an implementation in some other language.

I.e. in your case, the compiler to JavaScript would be written in Lisp, but you could always compile your compiler into a JavaScript program. At that point, running Lisp would only require a JavaScript interpreter.

Lisp is already bootstrapped, so you don't have any challenges in that regard.

[edited for clarity]

### I'm used to a more liberal definiton of bootstrap

but I think your definition has value, too. It invariably requires a throwaway program though, and the resulting compiler invariably requires an earlier version of itself. These are the reasons for developing my alternative approach.

[rewritten & misunderstanding removed]

### First Time

The first time I heard of compiler bootstrapping was to eliminate assembler code. It is a two step project.

1) Code a compiler for language "X" in assembler and assemble it to get a running X compiler. Assembler language was used, because the other choices were COBOL and Fortran.

2) Code a compiler in "X" to comile X, and get it running. Thereafter, maintenance costs are tied to coding in X.

Hopefully, X is a high level language that is less expensive to maintain than assembler, in which case the project manager disposes of the X compiler written in assembler language..

### Not sure I recommend it. Too Painful

In hindsight, I found it (bootstrapping) too painful an experience and I am not sure I would recommend or consider doing it like that again.

I started out writing an interpreter in ML, and found out you really need to jump trough hoops to get that to work. I ended up partially compiling the code, which -since I work on it occasionally- ended up costing about half a year- and obfuscuated the bootstrap process. (Devising my own abstract machine for that didn't help too; I wanted to show SECD/ZAM are overly complex and all one needs is just one stack.)

Expressing the compiler in Hi was easy and immediately shaved of 30% or so of the code size w.r.t to the ML version, so that was a good thing. Getting it to bootstrap afterward was a tedious process of some months since I have a scheme with a minimal runtime and no real abstract machine. Compiling to an abstract machine like ZAM might have been less error-prone.

Moreover, stuff is not easily fixed in a bootstrap compiler (which compiles to C). Unless your language, and your compilation scheme, is really fleshed out, there just isn't a lot of room to experiment. So, well, I am just kind-of stuck there at the moment.

The bad point of bootstrapping is that, yeah, it's a magical moment, but you don't know what you end up with right until the last point, the bootstrap, and it may be painful to fix stuff afterwards. (Which is a bad thing since I am the experimental type, I actively try to do stuff simpler and/or different.)

In hindsight, I regret not having build a Hugs-like top-level in C (despite the pains of expressing a compiler in that language) since you just get more time experimenting with stuff and most things remain easy to fix.

Sometimes, I really get the feeling I took the longest route to the slowest compiler possible. ;)

(It ain't that bad, just unsure where to take it from now. The semantics is pretty close to ML, the least I know is that everything is fixable.)

### Semantics

In what form had you specified the semantics of your language ? In my experience, writing the interpreter (which can be considered a big-step operational semantics) is the good time to check that the former specification was correct or, more likely, to refine it due to spotted defects. Issues when implementing the compiler might denote issues in the semantics.

### Turing Complete?

Does compiler bootstrapping assure it to be Turing complete?

### No. A language can have a

No. A language can express a compiler for itself without being Turing complete. Trivial proof: a total, high level language can compile itself to a high level language by use of the identity function. ;-)

Even non-trivial compilers are feasible. Compilation can leverage properties in the target language. A language that can interpret a complete instance of itself will be Turing complete. (OTOH, a total language could interpret itself to an arbitrary fixed depth, which is often more than enough expressiveness in practice.)

### Eager Evaluated Lambda Calculus on Records with RTTI

I didn't really had problems getting the operational semantics right. The real semantic pain point in it, atm, is the interplay between types, interfaces and terms. I still want to get the typing system right, the simple HM inferencer I have now suffers from the normal problems with HM/impredicative types which I want to get rid of. (I want to get rid of impredicative HM style inference with a new scheme for handling unification; a few weeks ago I just decided that HM is the wrong approach to take.)

As far as the operational semantics go: I started out with a simple AST evaluator, which wasn't sufficient to bootstrap a language (it was too slow and went into too deep recursion for OCAML to handle). Subsequently, I implemented the semantics on a simple stack machine to deal with the recursion (after which OCAML had some problems with native routines going into deep recursions handling strings).

After that I experimented a bit with different schemes of translating to C, first a direct translation of the AST and after that a thunk based approach for an eager language. The thunk based approach is as fast as different approaches to translate LISP, but suffers from an explosion in size, a large constant, in the translation from combinators to abstract assembly. That large constant, and the fact that the C compiler cannot optimize the interplay between different rewrites, affects the code size, the memory usage and the running time of the resulting programs.

The real problem seems to be that the distance between an abstract LC and a concrete machine is just big and there is no common knowledge how to bridge that gap best [at least, I don't know any]. One way of looking at ZAM/G-machine/ABC-machine approaches is that those keep that distance (the semantic gap) small by targeting an abstract machine close to it (where interpreting that is faster than running naively compiled code). Maybe it is just a problem of naive LC vs combinator/thunk based approaches, it may be that GHC (which seems to produce an awful lot of code too) has similar problems, and that there are other better approaches. (OCAML's native code compiler seems to result in less code than GHC, but I don't know what approach they took in that compiler.)

It's a puzzle, I really don't know how to get the mentioned constant low at the moment.

(I am pretty sure I got the operational semantics right, that was the easy part. It gives the same results on a naive recursive evaluator, a stack based evaluator, an AST based evaluator, and a thunk based evaluator. The problem is that with an LC semantics translated to combinators you end up with expressing everything in relatively small rewrite steps, which -translated naively- are just slow expensive steps in comparison to the imperative first-order constructs of C. I.e. bootstrapping means billions of rewrite steps of combinators, where if each of the rewrite steps takes several hundred/thousand of cycles, just gives a slow process. [Especially when the compiler burns too many of those rewrites on processing lists of characters.] )

### The real problem isn't

The real problem isn't compilation speed of small programs, it's the build time of huge systems. Even with build farms, some giant apps still take minutes or hours to build from scratch. When incremental build stretch into minutes, you've got a real time drain.

Compilation speed for C/C++ applications has less to do with compilation speed than IO performance, header file organization, and build system speed.

Compilation for Java and most other modern languages is mostly a non-issue.

### Not entirely true

Compilation speed for C/C++ applications has less to do with compilation speed than IO performance, header file organization, and build system speed.

Like you, I thought C would be fast. Hell no! It takes GCC about ten minutes to compile my generated code. No idea why [except for that I generate a lot of C; one of the things you only find after you have bootstrapped it].

### Smaller Files

Smaller and less complex files help compilation time during development.

My interpreter will be limited in size by providing plug-ins that have the same interface as built-ins.

The interpreter will not change much once plug-ins can be loaded. And, some day, it would be nice to have an Ameba to C translator that makes plug-ins from Ameba code. With partial evaluation and compilation into plug-ins, Ameba will run as fast as necessary.

Does TCL have a TCL-machine code compilation cycle?

### ...include files

Compilation speed for C/C++ applications has less to do with compilation speed than IO performance, header file organization, and build system speed.

More precisely, C/C++'s pre-historic separate compilation mechanism results in compilation costs that are potentially quadratic in the number of files.

### ...why?

Why would a quadratic cost be ascribable to separate compilation?

### Transitive inclusion

Because of header files that may have to transitively include others for transitive imports. In the worst case, for each module you compile, you end up recompiling the header files of all previous modules. Even "precompiled headers" generally can only spare you the very first stages of parsing, because macros can interfere arbitrarily with all later phases.

### True

This is one reason I believe flat files are inefficient program storage.

### I agree, somewhat

I agree, they are somewhat inefficient. But of course there are some important advantages to having programs as text-on-disk, primarily that the program is tool-agnostic, i.e. isn't tied to whatever environment created it.

Computers have so much memory and disk these days that the threshold for a "small" program that can be completely kept in memory and recompiled from scratch in a handful of seconds just keeps increasing. With IntelliJ for example, 100KLOC projects are no big deal at all. It does have persistent caches, but it basically loads and indexes your project upon startup, which only takes a few seconds. Builds are incremental after that, and typically super fast.

### Why

A relational database is application-agnostic, why should a program-base be any different. Clearly, compilers and interpreters would need to interface with it as they do to databases. Part of that interface could be to save a set of precompiled headers in the program-base. Another function of a program-base might be to facilitate searching, for example by multidimentional expressions.

### I've heard of people using RDBMS

I've heard of people using RDBMS representations for code in the past, especially for graphical programming languages and structured editors. However, in the cases I know the authors eventually provided a flat-file representation (XML in one case, a home-made language in another). I guess people don't feel as comfortable treating the RDBMS schema as something unlikely to change and suitable for code distribution between organizations.

### I gave RDBMS and

I gave RDBMS and multidimensional expressions as analogs to a program-base. I expect someone will analyze the kinds of searching needed by meta programmers and develop a program-base. Maybe it looks like an AST.

### Or an OODBMS? But actually,

Or an OODBMS? But actually, I'm sure the ultimate structure will look more like BigTable, or some other NOSQL variant. In this case, you depend more on really powerful search algorithms than hierarchical or nested organization.

### Why so? Andreas complained

Why so? Andreas complained about a semantics issue -- the use of #include directive for modularity -- and that's not a code representation issue. Different strategies exist for separate compilation, that still operate on text files and doesn't have those defects.

In essence, programming with "flat files" is just programming with separate pieces of program text, each one with an adress -- the filesystem path. Your language define how those different pieces are handled, but the design space is very rich, and I don't see what a different representation would bring *on that particular issue*.

### so Performance

My suggestion was to make compilations quicker. When a person builds a large system e.g., Linux and Windows, source inclusion files are compiled again and again. My thought was to precompile parts of programs, for example save the results of lexical scanning and context free results from parsing in a structured form within a server. Then, compilers and editors can access the server to get partially compiled code and source respectively.

Presumably, serving precompiled code would save compilation time.

That's what I was referring to as "precompiled headers", and is done by a number of C/C++ systems. However, due to C's context-dependent grammar as well as the unstructured semantics of its preprocessor macros, both of which interfer with included files, that approach becomes very complicated and still does not solve the general problem.

### Standards => black hole?

You seem to be saying that pragmatic decisions by the C standards committee have resulted in a language that eats CPU cycles like a black hole eats mass and energy.

I infer that this inability to precompile headers is not systemic to all languages, and that different decisions by the standards committee may have avoided it. Is this true? If so, can/has the standards committee make/made changes to shorten compile time without developers doing major rewrites?

### Backwards compatibility

The use of textually included header files for separate compilation was a central design choice that dates back to C's infancy. Other languages made other choices. By the time of standardisation (1-2 decades later), too much of C and its libraries relied on it (esp wrt macros). I believe there have been proposals for proper module systems for C, but getting rid of headers would necessarily break backwards compatibility big time.

### I remember seeing a study a

I remember seeing a study a few years ago where they discussed a module system for C, and they analyzed a large body of C code. They found that the vast majority of C code (> 90% IIRC) followed an idiomatic module convention using either abstract types or exposed types. CMod might have been it (LtU discussion).

Macros are definitely the tough nut to crack for speeding up C header inclusion though. Macros can depend on compiler-defined constants, so you need to capture environment bindings in any cached result.

### macros etc

I think the issue of "macros" and "typedefs" are minor problems Andreas: the main problem is much simpler. In C, every function declaration can be compiled independently of every other function declaration. Every function definition can be compiled independently of every other function definition (but depend on all the declarations).

All fine. The problem is that with expanded types, type definitions must be ordered, and, you can't include a description of some struct in a pre-compiled interface, without also including the descriptions of every struct it is composed of. Similarly, you can't compile a function declaration without knowing the complete description of the types of its arguments.

So it is the types that prevent separate compilation of headers.

In C++ it is much worse because of inline functions and templates.

In Ocaml, all types are a single word, so there's no need to worry: no interfaces need to know anything about a type, it's only the definitions that require information (such as what fields exist in a record).

### Not specific to C

The problem is that with expanded types, type definitions must be ordered, and, you can't include a description of some struct in a pre-compiled interface, without also including the descriptions of every struct it is composed of. Similarly, you can't compile a function declaration without knowing the complete description of the types of its arguments.

Sure, you always need all structural type information during compilation. But in general, you need that information anyway when code is using a type, think e.g. datatypes in ML.

This is merely a variant of the problem of importing transparent type definitions (type aliases) from one module into another. If your language allows that, then you cannot generally support fully separate compilation (where modules can be compiled in any order), but only incremental compilation (where compilation of one module may need to access type information from another compiled module). This is the case in many languages, e.g. OCaml, Modula, Java, etc.

So, no, I don't consider that a specific problem with C. It does not induce the problem I was talking about.

### Ah Yes, Compiling Linux and other big systems.

Making large systems is the domain where compilation speed is valuable. The big question is how shorter compile times affect an economy and enterprises therein. There are many unknowns, including the sum of all time saved by workers due to faster compiler times.

Of course, that makes little difference to me. Programming puzzles motivate me.

ahhh!