Jonathan Blow's next foray into game language design

As of about a week ago, Jonathan Blow (creator of various well received games such as Braid as well as various PL efforts) has pair of talks on twitch.tv (later both put on youtube) where he issues a call to arms for game developers to design a new programming language around their concerns. He discusses what he sees as some of the shortcomings of existing languages that are supposed to be C++ replacements, and some of the requirements he feels game devs have. There was a Q&A session as well.
The talks are more practical than PL theoretical, but interesting (and occasionally frustrating) never the less.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Relate

I'd love to see this related to Lambda Aleph.

Hm

Lambda Alpeh seems miles away from being usable to game development. I don't know what the "next foray" is (I just don't watch videos; is someone planning on doing a transcript?), but it is likely to be rather distant from "a fully-dependent type theory, only with more semantic subtyping".

I watched the video

tldr; Programming languages should use curly braces, but not semi-colons.

OK, now I've watched the video (skimmed)

OK, now I actually have watched the video. Here are some poorly formatted notes:

Can't be a 'big agenda' language, which I interpret to mean that it can't push the envelope too much in unproven directions. Examples given:
- Purely functional everywhere.
- Buffer overruns are impossible.

Why not:
- Go: GC, nonstarter
- D: Too much of Cs baggage to justify not just using C
- Rust: Too 'big agenda' in the sense above

Lower friction:
- Joy to program in
- Performance
- Simplicity (but not to the point that you're working around the abstractions)
- Designed for good programmers

He talks a good bit about finding 85% solutions (solving the 85% case but not trying to force everything into the mold).

He doesn't like RAII or exceptions. Considers memory very different from other kinds of resources because it's ubiquitous.

He wants pointers and proposes an owned pointer syntax. Didn't see a comparison to Rust (edit: nevermind, he discusses briefly in the Q&A past the end of the talk).

He wants to get rid of constructors (just use functions for that).

Wants to build in a syntax for strings that share memory.

Get rid of header files.

Wants to support refactoring.

Likes Option types, but not a full algebraic datatype that you have to unpack to use.

When he talks about concurrency, it sounds like he wants more safety features.

Doesn't like implicit conversions.

Wants serialzation support without C++ hacks.

Program should specify how to build itself.

Permissive license.

Better / no preprocessor.

His goal seems to be pragmatism / fixing pain points with C++ rather than something more ambitious. I was hoping to some gems but was disappointed by the specific proposals. EDIT: But also should say that as a former C++ games programmer, I agree with most of his general sentiment.

Tim Sweeney is much better

Tim Sweeney is much better about this.

Rust: No big agenda

Isn't one of the major points of Rust not to have any new features?

A region system is quite a

A region system is quite a big feature...

"Good programmers"

I skimmed through the first half hour, but he seems trapped in a C/C++ mindset, without much imagination or desire beyond fixing obvious problems with those. And he thinks a game language should be designed "for good programmers", not to protect against mistakes.

Well, a good programmer is one who realises that he will always be a bad programmer with respect to making fatal mistakes.

Without knowing exactly what

Without knowing exactly what he meant, my take is that a language for good programmers should, just like any language, make the easy-to-do, the default way to do something, safe and protect against mistakes, however, in recognition that you sometimes have to do weird things, allow for the unsafe, too, albeit not in a way that can be done by accident.

For example in modern C++, where you don't use C-style casts anymore, you can still use reinterpret_cast and const_cast etc., however it's not something that you will type by accident, and when you do use it, it sticks out like a sore thumb. But it's there for the few cases that you really need it and where the alternatives would be much more cumbersome.

We all make mistakes

The point is not that "good programmers" don't make mistakes; Blow is overt about that in one of the videos. The point isn't even really that "good programmers" make different kinds of mistakes than novices.

Really, the point is to be aware of the kinds of mistakes practicing programmers do make, and their associated costs. Then, when looking at design choices intended to protect against mistakes, we must weigh the benefit of the mistakes we are ruling out (or at least diminishing) against the cost of whatever we are adding to the language.

A key part of the argument is that often times we pay the cost for a language design choice all the time, and not just in the (few?) times when it actively saves us from ourselves. Blow refers to these kinds of ongoing costs as "friction."

If the integral of the "friction" costs over the lifetime of a project are greater than the costs of the bugs it prevented, then having that degree of protection was actually a net negative.

Furthermore, there may be cheaper ways to mitigate certain classes of mistakes without complicating the language design. If my debugger/runtime quickly catches my attempts to access already-freed memory, then maybe I don't need a complicated region system or garbage collector to save me from that particular mistake.

mighty big

"If my debugger/runtime quickly catches my attempts to access already-freed memory, then maybe I don't need a complicated region system or garbage collector to save me from that particular mistake."

that's a mighty big 'maybe' in my experience. :-)

To each their own

A heap implementation like Microsoft's debug heap, which fills memory with bit patterns to represent different states can be surprisingly useful. If you see 0xCCCCCCCC or 0xCDCDCDCDCD in your debugger, you probably have uninitialized stack or heap, respectively. If you see 0xFEFEFEFE, you are reading from an already-freed allocation.

Subtle memory errors are still possible, but very simple measures can make the common cases easier to detect.

Game developers have overwhelmingly decided that they *don't* need garbage collectors or region systems (no "maybe" about it). They argue that the net cost of development is lower with an "80% solution" that catches their most common mistakes, rather than a 100% solution that brings along additional complexity or "friction."

not all game developers

John Carmack, who might know a thing or two about games development seems quite interested in functional programming for games. He started porting Wolfenstein 3D to Haskell. He was also playing with lisp. His conclusion so far seems to be stricter is better, with Haskell for full on game development, and lisp more for game scripting and modding. He seemed to think the enforced purity of Haskell was preferable to ML, because people will always take the easy solution if the language lets them. Of course he hasn't finished porting Wolfenstein, and I don't know what kind of performance he was getting.

It's not just their problem

Game developers may have "decided". At the same time, there is an increasing amount of heat from gamers over unacceptably buggy and crashy game releases these days, including AAA titles.

"80% solutions" in your dev lab don't help customers whose computer you crash later, or whose security you compromise with buffer overruns. In my experience, they also don't help the fact that such bugs can still take ages to reproduce, hunt down, and fix.

Give developers a few more years and complexity cycles. After all, there once was a generation that had overwhelmingly decided that you don't need anything but assembly language.

Game developers have

Game developers have overwhelmingly decided that they *don't* need garbage collectors

Is this really the case? Many systems used in game development (Unity, Kismet, and Processing, for a few examples) are garbage collected. Many game scripting languages (e.g. Lua) are GC'd.

I realize that there are some vociferous objections to GC from those who develop on the bleeding edge. But I wonder which fraction this really amounts to, and what portion of objections would dissipate with sufficient exposure to real-time GC or benchmarking GC or the like.

Re: John Carmack - Yes,

Re: John Carmack - Yes, Carmack is bullish about pure functions, and reigning in use of state. He has previously advocated for taking a more functional approach to programming in C.

Carmack's point is much more about being careful about side effects, and does not directly argue for things like GC or "safe" language features. His position is very much in line with Mike Acton et al.'s idea of "data-oriented design."

Blow's second talk in fact references Carmack direclty, specifically this post on how the "conventional wisdom" of procedural decomposition can actually lead to worse code quality.

Perhaps a direct quote from Carmack might be in order (from the first link):

I do believe that there is real value in pursuing functional programming, but it would be irresponsible to exhort everyone to abandon their C++ compilers and start coding in Lisp, Haskell, or, to be blunt, any other fringe language.

Re: game developers and GC - It is often worth talking about the distinction between the engine language and the game language. You might implement game logic in Kismet, or Lua, or the C# part of Unity, but the underlying game engine is almost always in native "unmanaged" code.

Game engines are often clear examples of Greenspun's Tenth Rule, but what many people fail to see is that this is not a compelling argument that the Unreal Engine should have been written in Common Lisp. To say so is akin to saying V8 should have been written in CL.

Just because many game engines embed GC'd languages does not make a compelling case that the engine can or should be written in those languages.

Re: "give developers a few more years" - This is more or less the point of Blow's talks. These developers actively *want* better languages for the tasks they perform.

The problem is that they've tried out a bunch of the languages that others tell them are better, safer, or more advanced, and found them wanting.

Re: "80% solutions don't help your customers" - The thing that helps your customers even *less* is if you never manage to finish your game because the "100% solution" language you wrote it it imposes too many burdens - either in development time, or run-time performance.

Certainly this comes down to different programmers, in different domains, with different experiences. I expect that game developers would appreciate it if we didn't assume they were stupid and/or lying about their experiences when they relate them.

game engine developers

There are some interesting arguments in favor of using GC'd languages even for game engines, e.g. if you wish to include a compiler in your engine for on-the-fly generation of shaders, or if you plan to use a lot of procedural generation. In my own experience, writing compilers without GC is painful, and procedural generation benefits from thunks and closures and such.

Anyhow, do you mean to clarify your earlier position as: "Game engine developers have overwhelmingly decided that they *don't* need garbage collectors"? I'm willing to grant that position.

Though it remains unclear to me that game engine developers are justified in their overwhelming opposition to GC. My impression is that a few bad experiences with GCs and allocation models not tuned for gaming have led to a lot of premature generalization about GC.

Agree on correction/clarification

Anyhow, do you mean to clarify your earlier position as: "Game engine developers have overwhelmingly decided that they *don't* need garbage collectors"?

Yes, that is probably what I should have said ("engine" in place of "game"). Thank you.

There are some interesting arguments in favor of using GC'd languages even for game engines, e.g. if you wish to include a compiler in your engine for on-the-fly generation of shaders, or if you plan to use a lot of procedural generation. In my own experience, writing compilers without GC is painful, and procedural generation benefits from thunks and closures and such.

I've worked on the shader compilers inside GPU drivers, where using GC would be frowned upon. The usual practice is to set up a fixed-size arena of memory for the compiler to use, and then only let it allocate from that arena. At the end of compilation, you just recycle the whole arena.

So long as your compilation doesn't churn through allocations too fast, you might never need to "collect" (just let your garbage leak until the end of compilation). If you need to be smarter, you might just implement an ad hoc "copying collector" by doing a deep copy of your program representation from time to time (maintain two sub-arenas).

Though it remains unclear to me that game engine developers are justified in their overwhelming opposition to GC

I can't pretend to be up to date on the latest in GC, but I think the demands from games are more than just the old saw of performance (although that is probably the most important complaint).

1) Predictability. A GC'd heap should ideally have similar characteristics to a non-GC heap in terms of how many cycles it takes to allocate or free. It is not okay if the average case is good, but sometimes allocations lead to long pauses.

When evaluating algorithms for use in games, worst-case guarantees usually matter much more than average-case.

2) At the end of the day, a GC frees you from the task of keeping things from being deleted, but gives you the new task of *ensuring* that things get deleted. A dangling strong reference in the wrong place (e.g., a listener) might create a leak.

From my limited experienced with GC'd languages, this is a real problem, that only gets magnified if the GC infrastructure gets used to "finalize" any non-memory resources.

Azul VM

I'm no expert on game development (or, I'm rather clueless), but I know one or two things on soft real-time GCs, which is about point (1).
Tuning a GC (for Java developers) apparently costs millions of dollars. What's the measured slowdown for JVM-based game engines?
Moreover, what's the slowdown with the state of the art in soft real-time GCs, namely the ones invented by Azul (running on x86)?
Until I see a benchmark (not that I've been looking), I'll guess these benchmarks weren't done because the platform isn't so widely deployed. But let's *please* not invent a new language if the matter is "just" deploying the state of the art.

Unlikely to Get Good Data

JVM-based engines

That's the challenge right there. There just aren't a lot of JVM-based engines out there, and I doubt you could extrapolate from those to any kind of conclusions for native-code engines.

At a higher level, though, a GC is all about solving a problem that these developers just don't find to be that big of a deal in practice. And no matter how well that GC performs, adopting it involves some trade-offs.

If I want to use a chunk of write-combined graphics memory as a ring buffer for streaming dynamic vertex data (where vertices might be of many different types/sizes), how exactly is adopting a GC going to make my life easier?

Generational hypothesis

There is good evidence to suggest that most programs obey the generational hypothesis in that most objects die young. This has turned out to hold extremely well across different programming languages and programming paradigms. Moreover, there is a power law that governs the distribution of object sizes, and it seems most programs reflect that as well.

Excessive object creaion

Surely this is caused by excessive object creation. I tend to look on objects created / destroyed per second as a kind of metric of inefficient code. Its much more efficient to create a single object and re-use it through iterations, than to create and destroy it every cycle (I think most compilers should lift the object creation/destruction out of the loop in any case). Return objects can also be optimised out by the compiler.

In high performance code there are often no objects created or destroyed. The space for them is pre-allocated in a vector/array, and this remains constant throughout the execution.

Escape analysis

It's much more difficult, perhaps impossible, for compilers and/or runtime systems to perform escape analysis when you do reuse. Escape analysis enables scalar replacement and stack allocation, the first of which is far more efficient than ever using an object at all. You should write code that is the cleanest and most obviously correct, and often that means small, immutable objects.

A blanket statement like "Its much more efficient to create a single object and re-use it through iterations" is just not true if it defeats escape analysis.

One allocation always faster than many.

I am talking about removing all allocations and deallocations from the time critical section, what could be faster than that? You don't need to do escape analysis, because the whole region is freed in one go when you are finished.

Zero allocations is even

Zero allocations is even faster, which is what scalar replacement gives you. And why muck with regions manually if you can have the compiler infer them.

What Language?

Because the compiler does not guarantee to optimise, whereas doing it manually does. Benchmarking clearly shows the performance advantage. With performance critical code you profile each change and only keep those that improve performance.

The theory is very nice, but in practice it just doesn't work. If you have some language you think I should try that will have performance as fast as C++ or Ada for high performance applications like a Monte-Carlo simulator, I would be happy to be proved wrong. I would happily port the benchmarks.

as fast as C++ or Ada

(wow, i really truly should go do some work in Ada!)

so i would say that if it is fast but not as fast as C++ but sucks less than C++ that can generally be a win :-)

I want both

Maybe I am expecting too much, but I want both performance and elegant structure.

Ada (GNAT) can be as fast as GCC, its the same backend, but you need to use profile guided optimisation. To do this I have used addressing inside some modules as an optimisation, but by using private types this is all hidden inside the modules.

You can see an Ada and C++ implementation of my monte-carlo engine for Go on my GitHub. Its slightly unfair because I wrote the Ada second, and it has a cleaner structure. Also the C++ version has a lot of stuff from Stepanov's elements of programming thrown in.

https://github.com/keean/Go-Board
https://github.com/keean/Go-Board-Ada

At the moment I seem to prefer type classes to modules, and Ada loses out to C++ on that front as C++ templates can model type-classes which you cannot do in Ada.

ATS

I think the language closest to what you're looking for, from what is available today, is ATS. ATS claims performance and memory usage on par with C, supports explicit allocation/deallocation (but also has GC), and integrates theorem proving, dependent types, and linear types. (But caveat emptor; I haven't actually tried the language.)

ATS

I looked at ATS, it seems interesting - I think at the moment what I am aiming for is different enough to be distinct from ATS, but I should give ATS a go to see what its like to develop with.

You win something here

At the moment I seem to prefer type classes to modules, and Ada loses out to C++ on that front as C++ templates can model type-classes which you cannot do in Ada.

C++ templates model type classes at about the same level of fidelity that void* models abstract data types.

Underestimating C++

I think you are underestimating modern C++, "Concepts" in C++ model type-classes exactly, and you can get the behaviour in C++0x with a bit of work and boilerplate:

http://www.cs.kent.edu/~jmaletic/papers/JSCP12.pdf

This uses constraint-classes and type-traits, which are actually quite powerful mechanisms. Hopefully Concepts in some form will get into the next iteration of C++, so there is some nice syntactic sugar to make it all less cluttered and more readable.

Still if I thought C++ was adequate I would not be trying to design a better language myself, so you have to consider the above within the context of comparing Ada and C++ not other languages which don't give the same level of performance.

context sensitive

maybe not true in some situations.

In practice

Tim wrote: "At a higher level, though, a GC is all about solving a problem that these developers just don't find to be that big of a deal in practice."

Unfortunately the devs have no idea what they're doing. The developers create such completely facile rubbish most of the time because they have to spend so much effort getting the basics to work at all, there's no time for anything higher level. The devs think they did a good job, what they actually did was so ridiculously bad most gamers spend time fighting the game interface and logic.

Show me one strategy game which is any good at all at routing troop movements, recording and executing standing orders? Or an action game where the art is in acquiring sufficient practice to defeat a hard to use interface.

The very fact Tim mentions "graphics" is indicative. The graphics isn't the game, its just there to present the situation. Sure it should look cool but if more than 5% of the work is going into the graphics engine, the game is bound to be crap (I'm not talking about artwork).

There is memory pressure

[This was supposed to a reply to the games and GC discussion, but ended up at the top level.]

None of the comments so far mentions one important point: Games usually have memory pressure, and are likely to keep having memory pressure. Adding memory is not an option.

I remember seeing research on the memory/CPU tradeoff for GC indicates that it needs about 3x more memory to have the same performance as malloc/free (implemented using some kind of analysis system that placed perfect free() calls in traces from Java programs). And I'd expect that real time GC would need even more.

I suspect that the cost for games would be less (there's a lot of semi-static allocations that probably don't get ballooned 3x, e.g. graphics loaded to support a level), even the remaining cost may be too high.

I'm also somewhat soured on GC in general after the amount of time I've spent on tuning Java GCs for large scale production use (in non-games systems). I'm stuck with the feeling that at least the Java GC system saves up all the difficulty of dealing with memory and hit you in the face with it at some semi-random later time.

3x memory?

That certainly doesn't correspond to my own observations and experiences. I wonder under which conditions the tests were run.

Frequently, poor performance of GC can be traced to interaction with operating system virtual memory. The OS will page something out, and the GC will page it back in, and with few exceptions they aren't talking to one another. I believe this issue should be addressed in garbage collectors. But, also, I think it might be a non-issue for console games where you control memory.

Java is mediocre in all things. I can sympathize with people who come out of Java disgusted with OO, abstraction, recursion, concurrency, static types, or GC.

3x memory was without paging

This was not related to paging, as far as I can tell - they brought that out as a separate issue. As for conditions - I dug up the paper - it's

Hertz, Matthew, and Emery D. Berger. "Quantifying the performance of garbage collection vs. explicit memory management." ACM SIGPLAN Notices. Vol. 40. No. 10. ACM, 2005.

which is available a PDF. I can't see any serious fault in their conditions, and there didn't seem to be anything really serious in the previous LtU discussion.

I unfortunately don't find anything that cite this paper that seems to attempt to do a better analysis - more data would clearly be useful.

As for matching my experience: I don't feel my experiences are quantifiable enough to determine what kind of cost is incurred - I've too seldom written programs that are even remotely similar using the different technologies to feel I can judge. While my gut feel would have put the memory cost at much less, I can't really say that I think that gut feel has enough to go on.

3x with paging

The 3x figure is with paging, cf. section 5.2. "We assume a fixed 5 millisecond page fault service time." AFAICT, the paper never mentions disabling paging. But they do distinguish 'situation normal' conditions from those with 'scarce physical memory', where the page faults dominate.

I wouldn't generalize much from that paper. It focuses on precise GC, which is only relevant if you need GC-triggered disposal methods. Also, the 'liveness oracle' they're comparing against is better at deciding when to explicitly dealloc objects than humans will ever be.

Section 5.2 describe a separate test

The way I read this paper, section 5.1 describe the results when allowing different heap sizes and no paging, and section 5.2 describe what happens when you use the optimal heap size found in section 5.1 is run with various constraints on physical memory, leading to paging.

That's the only way I can make sense of the statement in 5.2 that goes "Figure 7 presents the total execution times for the Lea allocator, MSExplicit, and each garbage collector across all benchmarks. For each garbage collector, the fastest-performing heap size was selected", along the Figure 7 graphs that show different "Available memory", and no specification for available memory for section 5.1.

As for precise vs imprecise GC: As far as I know, the state of the art in GC performance requires precise GC (since you can't do any copying if you're not precise), with conservative GC having implementation benefits but it is not believed to have performance benefits in typical many-small-allocation programs (like the ones tested above). Is there evidence to the contrary I'm not aware of?

Precise GC

A frequent definition I've seen for 'precise GC' is that all dead objects are guaranteed to be collected. Though, it's often used in opposition to 'conservative', where it means we don't confuse pointers with integers (which is a necessary precondition for the other sense of precise GC).

There are state-of-the-art GCs that are precise in one sense but not in the other, such as bookmarking garbage collectors that may keep pages of objects alive if they're not in memory, or multi-generational GCs that stop collecting the oldest generation. There also exist experiments with page-level or arena-level GCs, which may keep a whole arena alive for the sake of one object.

Anyhow, even going back to review 5.1, it isn't clear to me that they ever excluded paging from their runtime compuations. OTOH, I don't wish to belabor that point any further.

The inventor of bookmarking garbage collectors is the author

Thanks for the clarification!

Just as background: The inventor of bookmarking garbage collectors is the author of the performance paper (the secondary author is his PhD thesis adviser.) They were both published in 2005; the bookmarking one a bit earlier (June vs October) but that is probably not significant.

Memory management in a game language

I see memory management as the perfect candidate for independent implementation annotations. I think garbage collection should be the default in any language, because it's the only reasonable thing to do in the general case without memory annotations. But you should also be able to carefully annotate your program to control memory layout issues. The difference between 'engine' code and 'gameplay' code can just be that someone has taken the time to carefully specify the memory management policies of the 'engine' code.

annotations

I like the idea, but it isn't really clear to me what sort of annotations we might use.

It would certainly be interesting to automatically recognize SAFL like situations, where we can take many sophisticated functions and statically allocate everything we'll need for their computation, or perhaps use some simple observation on the argument to the function to decide how much to allocate.

Fuzzy on details

I have in mind a unification based scheme that works similarly to type inference, except that it's inferring representation relations. Annotations would serve a similar role to the one explicit type annotations serve in a type system. But this is still one of my fuzzier ideas. And honestly, I haven't worked on it in a while so it's way out of cache for me. As I recall, I thought it was promising.

Register allocators / latency

I don't know why video games and game code gets all this special pleading this days. People used to make these arguments about programs like web servers and mail servers 10 years ago, but a lot of those have awesome implementations in GC'd languages. People used to make a different but related argument about register allocation about 40 years ago, because back then it was considered too hard to solve automatically. Of course, nowadays there is almost zero chance of a human beating a compiler at register allocation in general, and only marginal chance when writing assembly. We don't design programming languages to allow programmers to try their hand at register allocation anymore, and hopefully in the next decade we can just _stop_ trying and failing to design languages that allow programmers to do a poor job at memory management.

This focus on memory management is a total red herring, anyway. It's not about memory management (i.e. the cycle of allocation and freeing), it's about _resource consumption_ and _latency_. That means total system resource requirements, since games typically push the envelope w.r.t. media. Latency is about responsiveness to input and framerate. And forget the "overhead" in terms of space because GC inflates object lifetimes. It's far more important for a language to provide means of defining space-efficient data structures rather than trying to micromanage their lifetimes.

Sharing ownership

The language has to share ownership of the data with the operating system and hardware, so it cannot independently decide the lifetime of objects. Some objects, for example a geometry buffer shared between the GPU and CPU will need to be kept alive until the GPU has finished with it. This may be indicated by some kind of call-back. If the garbage collector implementation is to remain free of these kind of dependencies (which seems a good thing), the language needs to provide manual resource control for memory, device handles, GPU contexts etc.

If it turns out that a given application does not need the garbage-collector, it should not have to pay the price. Also it would be nice to have different garbage collectors optimised for different uses.

This leads me to the conclusion that a language with C++ like constructors and destructors is needed, where gargage collectors can be provided as libraries. Having previously discussed garbage collector performance, it would seem optimisations specific to the GC implementation need to be applied to the objects using the GC, so the language would need to provide the ability for libraries to specify optimisations, possibly via some kind of reflection, and AST manipulation meta-language. This would also have the advantage of making GC research and improvement independent of the compiler and language implementation.

Handles

Handles are the solution to sharing GC'd objects with an uncooperative world. And sharing "ownership" with the OS is not really the same thing. Nearly all the cases you mention involve a temporal lockdown of data that is short-lived. You don't want to pollute your own nice code to manage the entire object's lifetime based on a few periods where it needs to be live for external processes. And for anything more there are cooperative tracing/distributed GC algorithms. And in any case the data formats shared between the OS and the GPU are completely specified--usually dense arrays of primitives. This is not at all the same problem as managing the objects whose layouts can and should be under the authority of the language and the runtime system.

And let me state as unequivocally as I can: garbage collectors are *not* libraries. At least not good GCs. Inserting barriers and optimizing them away are the purview of the compiler. Understanding the object layouts produced by the runtime system, walking the stack, and stopping threads for GC if necessary are key parts of the runtime/gc interface.

What's a library?

I think you may have stated more unequivocally than you can. Garbage collection is not a traditional library because of the way it interacts with cross-cutting representational issues, but I think it's a good idea to organize a compiler such that those kinds of issues are exposed at the language level in a way that GC-in-a-library makes perfect sense. A good GC, even.

Handles

This seems the wrong way around. The primitive data pointer and manual memory allocation are the simpler concept. Complex systems and abstractions should be built on the simpler ones. GC belongs as a layer on top of this simpler layer.

I also disagree about the short lived comment. Its much better to allocate a single block of shared memory with the GPU, and use it to contain a shared data-structure at fixed address, than it is to keep re-requesting the block over and over again.

As I pointed out, in order to write a garbage collector as a library the language has to provide the reflective capability to inspect object layouts, walk the stack etc. Most languages already include ways of stopping threads.

Overconstraining the runtime system

The primitive data pointer and manual memory allocation are the simpler concept. Complex systems and abstractions should be built on the simpler ones. GC belongs as a layer on top of this simpler layer.

The first statement doesn't necessarily hold. I don't agree that primitive data and manual memory allocation are necessarily "simpler" like you suggest, especially when you get to alignment, endianness, portability, and memory model issues. Your second statement suggests that layering is always the right answer, but layering can lead to abstraction inversion, especially when the simpler abstractions are the wrong ones. And I argue very strongly that building a GC on top of a system of manual memory management and manual data layout is exactly the wrong thing to do. That's because you require the language implementation to follow data layouts specified by the programmer and it loses the freedom to choose better ones. I believe you should only specify specific data layouts when it is absolutely required by external code or hardware, leaving the rest of the data representation decisions to the compiler and runtime system. Otherwise you are forcing even more complexity on the programmer and only offering new ways to make mistakes. And forcing data representations means that some implementation techniques are no longer available to GCs, e.g. making use of mark bits in object headers. By using a better abstraction that is _not leaky_, e.g. objects vs structs, then the implementation is free to choose more efficient implementations.

I also disagree about the short lived comment. Its much better to allocate a single block of shared memory with the GPU, and use it to contain a shared data-structure at fixed address, than it is to keep re-requesting the block over and over again.

It depends on a lot of factors, not the least of which is if the memory used is large enough or used infrequently enough that reusing it makes sense, or if you have hundreds or thousands of smaller buffers (e.g. textures) that you are trying to manage that are stored in the GPU memory. Again you confuse one special use case with the general problem of managing object lifetimes, as if you believe that a GC prevents you somehow from having a few large, long-lived objects laying around, or having objects that contain spans of external memory outside the heap, and especially objects full of primitive or raw data that are used for interchange with external systems. Most production GCs have special object spaces for large, long-lived objects of this type for exactly this reason. I'd argue that having a general purpose GC that can handle all kinds of use cases increases rather than decreases the programmer's options. You seem to harbor the belief that having a GC prevents you from interfacing with external systems and hardware. This is just not the case. JNI and typed arrays in JavaScript are notable examples. In fact typed arrays and WebGL do exactly what you seem to think is impossible: directly interface between a safe, GC'd language and the GPU. And for decades safe GC'd languages have offered buffers or byte arrays for doing IO directly with the OS. So you seem to keep arguing against a limitation that just simply doesn't exist, as demonstrated by many production systems that exist _today_.

As I pointed out, in order to write a garbage collector as a library the language has to provide the reflective capability to inspect object layouts, walk the stack etc. Most languages already include ways of stopping threads.

But why? Why do you care if the GC is a library? Do you care if the compiler you have has a nice interface for plugging in different register allocators or loop optimizers? That's a nice architecture to have--heck I even like building such systems! (see this paper for more context)--but this should absolutely not bubble up to the level of the language or application programs.

Simpler for the CPU

The statement was about being simpler for the CPU. This is easily measured by the amount of code required to perform the operation. A large runtime system is a clear sign the abstraction is not 'simple' from the perspective of the CPU.

I think you need the ability to precisely control the memory layout when you need to. Your statement about the compiler being able to choose the most efficient representation is just not true. It cant due to the combinatorial explosion. You want the compiler to behave like a proof search (it has to fully explore the combinational complexity space to find the optimum representation).

I think a language should make managing the memory and other resources as easy as possible for the programmer, but in many cases a simple object-handle on the stack is all you need to ensure memory safety.

For me the runtime system is the problem. The language should not need a runtime system, all the code that is in the final output binary should be visible in the source code for the application and libraries. Hiding stuff in the compiler or runtime system seems less transparent to me.

. Of course, nowadays there

. Of course, nowadays there is almost zero chance of a human beating a compiler at register allocation in general, and only marginal chance when writing assembly.

My experience is that it's still pretty easy to beat compilers at assembly. It's just not usually worth the costs (in effort and maintainability), even in many performance-critical cases.

That's usually because you

That's usually because you take the compiler's output as a first step.

Yup

Correct. And these days, we can often use compiler intrinsics as an intermediate step.

You probably won't beat a

You probably won't beat a compiler purely on the register allocation problem, but if you are allowed to change the structure and instructions of the assembly code then you usually can beat a compiler. Compilers are stupider than you think! There was a drama around this in the linux kernel: link

Compilers improve

Compilers continue to improve over time, and micro-architectures change, whereas hand-tuned code might as well be chiseled in stone. Effort invested in tuning one program benefits just that one program, whereas effort in automatic techniques benefit a large class of programs. Moreover, tuning for one micro-architecture can be detrimental to the next.