Retrospective Thoughts on BitC

I believe Dr. Shapiro posts here sometimes, and I'm sure plenty of people here have already seen and read the post, but for those of you who wouldn't otherwise see it:

Retrospective Thoughts on BitC answers: What Ever Happened To BitC?

In the large, there were four sticking points for the current design:

The compilation model.
The insufficiency of the current type system w.r.t. by-reference and reference types.
The absence of some form of inheritance.
The instance coherence problem.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Type classes?

After reading the email, I am a little unclear on what all of the issues with type classes were. Was the main thing that there was no clear way to disambiguate multiple instances of a type class? And that reifying the type class into something that could be past around simply gets you part of the way to objects?

Having taken part in the

Having taken part in the discussion on the bitc list, Jonathan's concerns seem mainly related to specialization of type classes. Since this is a systems language, he didn't want implicit runtime allocation of dictionaries, and as he stated in the message, he wanted a simple, sane compilation model. Type classes + separate compilation forces either runtime allocation, or a link-time compilation model to specialize all the code dispatching to instances.

Specialization seems to work

Specialization seems to work quite well. MLton does that, C++ does that, .NET does that.

Specialization

But as shap notes, it does require either full source code (as in C++ templates), whole-program analysis (MLton) or at least link-time code generation. It seems like link-time codegen was the solution they were most open to, but if you want to support dynamic linking/loading, it does introduce a lot of hard-to-verify complexity into the runtime.

Anyway, it seems that his biggest complaint about type classes was the inability to have multiple instances for the same type. This is ugly in Haskell as well, although people do seem to get by with using newtype to work around it. He also seems to be just generally allergic to the fact that instance declarations can't be made to follow normal scoping rules. I share his distaste.

Anyway, it seems that his

Anyway, it seems that his biggest complaint about type classes was the inability to have multiple instances for the same type.

I did point out named instances on bitc-dev when this first came up, I don't remember if there was any follow up from that. For that matter, I'm not even sure why named instances aren't part of Haskell already.

Edit: here's an LtU thread with Jonathan and Bruno Oliviera discussing various type class issues.

ignore

Nm ;)

Requires full code

Note that link-time code generation requires full intermediate code, because generating performant code requires inlining. Up to inlining I think the complexity here is containable and relatively easy to verify (either formally or informally); all you are doing is integer (+, -, *) [note absence of divide], typed constant substitution, and bounds check elision where possible.

Once inlining enters the picture, however, you are potentially biting off the full complexity of a JIT compiler.

The problem in BitC is that [on the one hand] ground operators (e.g. '+') are specializable, but [on the other hand] the compiler desperately needs to know at static compile time when that specialization will turn out to be at a ground type and therefore at a primitive instruction. You really, really don't want to have to wait until dynamic code generation and inlining to discover (belatedly) that some particular use of '+' was actually a native add instruction - the cascading consequences of that discovery in terms of the rest of the generated code are really quite brutal.

Note that most of the optimizations you would like to do at static compile time become impossible without this kind of information. Consider, in particular, that generalizing comprehensions means that you can't recognize an integer-undexed for loop early, which makes opportunities for things like loop unrolling moderately hard to recognize.

You seem to be thinking of a

You seem to be thinking of a setup where the compiler first does optimization, then emits an intermediate code. That is then handled by a JIT at run time to specialize it and further optimize it based on the new information that became available.

The alternative is to give up on separate compilation. There is no separate compiler and linker: you just give the compiler your full source code of the whole program, and it emits an executable.

Not an alternative

While I'm a great fan of whole program compilation, it doesn't scale very well. For our target application space, the ability to support dynamic shared libraries and separate compilation remains a requirement.

What in particular doesn't scale well?

I mean, let's go back in time to when MIT built X Windows. At the time, all they targeted to support was xclock, xload and xterm. The design of X Windows, and how linking to it works to this day, is based around that design mishap - think of situations that happen, such as two third party vendors deploying an app into a system running X. So I am just wondering if we can "solve the problem by avoiding it" somehow? You know your problem domain way better than I do, so I would appreciate your elucidation.

At the same time, many of your small previous examples of usecases for BitC really got me thinking of ideas for my own language and what test programs I would want to have in my "proof of concept" suite to show-off some nifty features. For example, Huffman encoding/decoding and TCP/IP stack implementation would be two domains where I wanted to show off various abilities of a language. For the former, I was inspired by Hashemian's papers that elegantly showed various trade-offs between time and space for both encoding and decoding operations. For the latter, I remember you mentioning in your ASPLOS paper the TCP/IP stack for the Linux kernel. The latter inspired the former.

Too many lines of code

We have users who are building programs in the 600KSLOC to 1MSLOC range. The time to whole-program compile such an application is measured in hours, but it's only that fast if you have a whole lot of RAM on your desktop machine.

At the scale of assemblies I'm actually OK with whole-program. At the scale of complete applications it's not viable.

False dichotomy

It might be that for your purposes you don't have to choose between whole program compilation and figuring out how get decent performance from per-module compilation at link time or after, if you can quickly gather the data you need to a decent job of compilation before linking.

I recall Manuvir Das giving a talk about doing static analysis as a service for various Microsoft departments, and he seemed to get a lot of useful information about semantics out of programs using just n log n analyses. He said you needed sub-quadratic performance to make things scale to programs like msword. The talk was at PEPM 2000 in Boston, and back then he said msword had a SLOC count of about 5 million.

I guess the crucial questions are (i) can you nearly always get good quality from such analyses, and (ii) how costly will the failures be (where the static analysis leads to more expensive code generation than either full program analysis or a JIT would)?

You can have a quick

You can have a quick separate compilation for development and slow whole program compilation for production. But I think that this is not even necessary for the vast majority of cases if you cache compilations. When you compile a polymorphic function for type T, then you put that result in a global cache. That way the first time somebody uses a List<int> it might take a few milliseconds, but the second time the result is looked up. Such a cache is nice anyway because it makes compilation take time proportional to the changes instead of proportional to the size of the whole program.

In the worst case it is still bad, but so is HM type inference. In practice it will probably be fast enough because this is basically what the CLR does *at runtime*.

That part not written yet

You may be unclear because the promised note dealing specifically with the type class issues hadn't gone out to the BitC list yet. It may or may not make more sense now that the note is visible.

Rust

Comment by Patrick Walton, a member of the Rust team.

By-ref

I'm not surprised that by-ref parameters produce such pervasive issues. When you combine polymorphism (and especially type inference) with second-class type constructors like that then you are in a mess, because they basically destroy the algebraic structure of the type language. That is almost always going to turn into an expressiveness problem at some point.

I am a bit surprised that language creators still keep copying those bad ideas about references from languages that screw them up. AFAICS, Algol 68 pretty much solved the problem of pointers and references almost 50 years ago. I'd be really interested to know why that approach is not suitable for today's languages.

AFAICS, Algol 68 pretty much

AFAICS, Algol 68 pretty much solved the problem of pointers and references almost 50 ago. I'd be really interested to know why that approach is not suitable for today's languages.

The problem is that they want to be able to return a by-ref parameter:

by-ref int Foo(by-ref int x, by-ref int y) = if x < 4 then x else y

I didn't think Algol could do that. Basically, by-ref parameters are first-class, and they need a region analysis to ensure the lifetime of the by-ref doesn't escape.

Algol 68 can handle that

Algol 68 can handle that:

PROC foo = (REF INT x, REF INT y) REF INT: IF x < y THEN x ELSE y FI;
INT a := 2;
INT b := 1;
REF INT c = foo(a, b);
c := c + 1;
print(a); print(b);

Will print "+2 +2".

Garbage collection

Algol is garbage collected, though, right? Isn't the decision to avoid garbage collection the main thing that makes this hard for BitC or Rust?

Yes, but

It has both stack and heap allocation. Heap allocation has to be requested explicitly. The above example doesn't use it, so it does not involve any GC either.

I was implicitly asking how

I was implicitly asking how Algol prevents you from storing a reference in an object with lifetime greater than the reference's storage. First-class references lead head-first into that problem. GC is optional for BitC, so that's not a workable answer unfortunately.

So what happens when you

So what happens when you start putting references to the stack into the heap?

Dynamic scope checks

Ah, good question. If it can't prove safety statically, A68 does dynamic scope checking when you assign or return something, so that a referee cannot escape its lifetime. Have to admit that I don't know how exactly that is implemented, though.

However, this is not a new problem. You already have it even if you only allow by-ref parameters, at least when you have first-class functions (which I thought both BitC and Rust have). Consider:

f(byref x) = {
  g() = { x := 2 }
  return g
}

h() = {
  local x
  return f(x)
}

h()()

That is, if you want safety, you need some kind of checking anyway, either dynamically or through some ad-hoc static restriction. If you don't want to impose random restrictions (e.g. a function may not access byrefs from outer scope), then a static check pretty much amounts to region inference.

Good point

I don't know how BitC avoided this problem earlier. Sandro, do you know?

IIRC, by-ref params were

By-ref params were initially heavily restricted as they are in C#. You couldn't return them as values and you couldn't capture them in a lexical environment. From the retrospective:

The fact that ByRef was syntactically restricted to appear only at parameters had the (intentional) consequence of ensuring that safety restrictions associated with the lifespan of references into the stack were honored - that was why I had originally imposed the restriction that ByRef could appear only at parameters.

Shap found this excessively limiting though, but pretty much any relaxation of this restriction seems to require regions for safety.

BitC refs were not capturable

The original impetus in BitC for by-reference parameters was the desire for (a) low copy overheads at leaf functions when passing large aggregate parameters, and (b) the need for OUT parameters in certain cases, notably for I/O. The first case [a] is particularly important at the native runtime boundary.

As originally crafted, by-reference parameters could not be captured in a closure, and could not be returned. This prevented upward escape.

I considered declaring that a captured by-ref parameter would be copied at the time of capture (which, for various reasons, is consistent with other aspects of capture implementation). I decided to hold that approach in reserve until we understood what to do with regions, because we knew the region issue was coming.

Reference escape rules

Ah, good question. If it can't prove safety statically, A68 does dynamic scope checking when you assign or return something, so that a referee cannot escape its lifetime. Have to admit that I don't know how exactly that is implemented, though.

If you prevent cycles by stopping byref from referencing existing variables, can't you simply use reference counting?

The scheme I am planning for dodo includes immutable references, arguments-only references and references without cycles (that is 3 types of refs). The args-only refs are reserved for imperative programs and for local assignment.

Well, sure, but

Well, sure, but then you've committed to garbage collection (of which reference counting tends to be the most expensive and least reliable instance).

curious about expensive refcounting

Though I'm not entitled, I'd love to hear reasons why ref counting is the most expensive -- out of curiosity, because I'm surprised. (I care about causes and effects, and less about whether things are sold as cheap or expensive.) I had the idea cost depended a lot on context, i.e., how much one refcounts and/or aggravating factors like threads.

To maximize cost of ref counting on purpose, I would do the following: 1) count each and every pointer to something instead of borrowing counts I see elsewhere (so it happens far more than necessary); 2) assume each and every refcount might change in different native threads so mutation must always be done atomically (which has increased cache line cost); and 3) maximize cost to touch cache lines containing refcounts by doing so with poor time locality, so odds of cache line miss is highest.

(To optimize instead, just aim to do the opposite.)

In case some folks have not heard the phrase "borrowed refcount", here's a short explanation. If you need to refer to something refcounted, but you see someone else holding a ref to the same object for you (for lifetime/duration exceeding your usage), let them do it instead and assume you're covered by the other reference. Basically, a counted reference can act like a scope in space or time you leverage as a free resource while you're in that same scope. (And escape analysis is necessary to see you must add your own counted ref if a ref can escape that scope.)

There exist contexts -- say where other garbage collection does not already exist -- where refcounting is cheaper when an alternative would require pervasive gc. I'm also familiar with an idea Lisp old-timers might recognize: even when pervasive garbage collection is present, manual allocation and refcounting can still be a good idea for performance reasons if the context is right. (Here gc would be just a safety net.)

RC vs GC

I didn't understand all of your question, but the quick answer is: reference counting normally has a cost linear in the number of aliases created to an object (i.e. every assignment you make, or every passing and returning costs). Plus, if you actually have shared state, then the constant factor of this is particularly high, because all counter bumps have to be synchronised. OTOH, with copying or generational GC, the cost is linear in the number of GCs an object survives (which tends to be 0 for the majority of objects). It trades time for space.

This isn't to say that RC cannot be cheaper under certain circumstances, but they are rather rare. Hans Boehms goes into much more detail in these slides.

linear in aliases

Very clear, thanks! I think "linear in aliases" was what I was looking for, matching (1) in my list of cost amplifiers.

Sophisticated reference counting is really complex

Most of this does not apply to deferred reference counting. The real problem with reference counting is the unpredictable cost of deallocation, and of course the inability to collect cycles. Usually a backup tracing collector is used, but I'm not convinced that the perceived benefits of reference counting outweigh the complexity hit from needing both in the same system.

Continuum

Yes, then you have jumped right into the middle of the continuum that actually exists between RC and tracing, and that Bacon, Cheng & Rajan describe nicely.

concur

I'm not convinced that the perceived benefits of reference counting outweigh the complexity hit from needing both in the same system.

This is especially true when the compiler impact of the clever RC techniques are considered. The deferred counting techniques, for example, rely on a sophisticated understanding of the possible concurrent interactions in the program. If you are willing to put this much work into your compiler, I believe there is more benefit to be had by applying that effort to comparable GC optimizations. Most notably [non]escape analysis, stack allocation, region tricks, and determining which objects are not reachable from other threads.

Other benefits of RC

There are other benefits to RC. For instance, Tcl has long supported efficient purely functional hash tables and other structures because it knows whether a structure is unshared (and so can avoid a copy) due to RC. On the flip side it does mean reasoning about performance means breaking abstraction barriers too often. E.g., trying to explain why somebody's algorithm has turned from linear to quadratic time because they accidentally introduced another reference. (Leading to horrible abuses of the K combinator to ensure a single reference).

A68, by-ref, and escape

A68 does dynamic scope checking when you assign or return something, so that a referee cannot escape its lifetime.

This is one of several hack-ish solutions (ours being another). The problem with it is that it has the effect of making by-reference types nullable, which defeats one of their main advantages.

I agree it's a hackish

I agree it's a hackish solution. It does not imply the existence of nullable references, though. You should get a runtime error immediately when a reference would be invalidated by passing it out of context, directly or indirectly. Invalid references can never exist.

The only clean solutions are GC or a region system.

Work of the Devil

BitC was specifically contemplated for critical systems codes. In some of those applications, "run time error" may be both morally and practically equivalent to "kill people". I'll grant that "don't kill people" never made the official requirements list for the language. Maybe it should have.:-)

Because of this concern, we worked very hard in BitC to eliminate any possibility of runtime errors. Offhand, I think the only remaining case inherent in the language runtime was the possibility of a bounds violation on the vector [and string] indexing operation, and we were very unhappy about that one. From a programming perspective, the problem is that you really want a way to abstract over collection sizes. The vector bounds issue is merely an example that happens to be built in to the language.

The plan was to make it possible for critical system programmers to eliminate a lot of those by introducing integer literal types in BitC v2. The basic idea was similar to what Mark Jones did with kinding in Habit. He saw deeper into it than we did, subsumed what we had in mind, and we set aside our plan in favor of eventually moving in the direction that Habit had set.

Obviously, no general language can eliminate 100% of runtime errors, nor [in my opinion] should the language seek to preclude the programmer from signaling error conditions by non-local upward control transfer [i.e. exceptions or equivalent]. What the language can, and in the critical system context must, do is ensure that a sufficiently paranoid and careful [group of] developer[s] can write programs that are demonstrably free of runtime exceptions. This becomes very difficult if the runtime system introduces runtime errors whose dynamic non-occurrence cannot be discharged.

Any such discharge is, of course, extremely hard to do. Given which, a language designed for critical systems should seek to avoid introducing mandatory runtime errors at almost any practically tolerable cost.

The other obvious runtime error in BitC arose from storage exhaustion. This one raised fewer concerns in my mind. I decided it was acceptable because (a) truly critical systems don't allocate at runtime and (b) we were already contemplating a tool that would allow the space demands of [very] selectively written programs to be characterized.

BitC is GC'd

BitC is also garbage-collected. I'm not sure about Rust.

Rust isn't GCed

The Rust manual says the default memory managment scheme is malloc/free, but suggests it is easy to swap in other schemes.

I think Graydon's reasoning was that GC isn't necessary if you can ensure memory safety by another mechanism, and he didn't like GC for the kind of distributed applications he had in mind. I can't see I really follow this point, since GC seems a good fit for the kind of Erlang-like messaging-only model for concurrency he has adopted.

Rust is GC'd, with exceptions

GC is task-local, and you can avoid GC by using stack-allocated values or unique references. From the manual sections 8.1.1 and 8.1.2, although it is a little ambiguous:

The heap is a general term that describes two separate sets of boxes: shared boxes -- which may be subject to garbage collection -- and unique boxes. The lifetime of an allocation in the heap depends on the lifetime of the box values pointing to it. [...] A shared box may (in the case of a recursive, mutable shared type) be cyclic; in this case the release of memory inside the shared structure may be deferred until task-local garbage collection can reclaim it. Code can ensure no such delayed deallocation [due to GC] occurs by restricting itself to unique boxes and similar unshared kinds of data.

This "mutable shared type" seems to contradict an earlier statement in Section 8.1, "Immutable portions of the heap may be shared between tasks, mutable portions may not."

"Shared"

I assume that in this paragraph, they just mean "shared" in the sense of "aliased," as in C++ shared_ptr, which (while it is threadsafe) certainly does not imply the use of multiple threads. Interpreted this way, the paragraph is reasonable and pretty clear. I'm just speculating, though.

Safety contract?

What's the safety contract in Algol in this case?

The by-ref problems were mundane

Actually, we didn't hit any problems related to polymorphism. The problems we encountered here were entirely mundane. Having taken an initial implementation short-cut that seemed to work, we mistakenly believed that we could successfully defer region support in our type checker. When we later discovered that the short-cut didn't work, we were well down the road and lacked the time to fix the problem.

There is nothing inherent in BitC's by-ref types that cannot be readily solved.