Design Principles Behind Smalltalk

With LtU being 10 years old, it seems appropriate to start recycling ancient underappreciated posts — especially those in the static HTML archive where commenting is no longer possible.

With Smalltalk-80 nominally turning 30 and PARC turning 40, 2010 is a fitting year to revisit Design Principles Behind Smalltalk by Daniel H. H. Ingalls, known to his German admirers as Daniel Ha-ha Ingalls.

Mentioned twice by Ehud in 2001 (here and here), Ingalls's piece should be filed under Visionary Languages. Alas, no such category exists on LtU.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

outline

The article at a glance:

Personal Mastery
If a system is to serve the creative spirit, it must be entirely comprehensible to a single individual.
Good Design
A system should be built with a minimum set of unchangeable parts; those parts should be as general as possible; and all parts of the system should be held in a uniform framework.
Purpose of Language
To provide a framework for communication.
Scope
The design of a language for using computers must deal with internal models, external media, and the interaction between these in both the human and the computer.
Objects
A computer language should support the concept of “object” and provide a uniform means for referring to the objects in its universe.
Storage Management
To be truly “object-oriented”, a computer system must provide automatic storage management.

This reminds me…  Aside from C++, are there any other mainstream-ish OOP languages out there that do not provide GC?

On with the principles…

Messages
Computing should be viewed as an intrinsic capability of objects that can be uniformly invoked by sending messages.
Uniform Metaphor
A language should be designed around a powerful metaphor that can be uniformly applied in all areas.
Modularity
No component in a complex system should depend on the internal details of any other component.
Classification
A language must provide a means for classifying similar objects, and for adding new classes of objects on equal footing with the kernel classes of the system.
Polymorphism
A program should specify only the behavior of objects, not their representation.
Factoring
Each independent component in a system would appear in only one place.
Leverage
When a system is well factored, great leverage is available to users and implementers alike.
Virtual Machine
A virtual machine specification establishes a framework for the application of technology.
Reactive Principle
Every component accessible to the user should be able to present itself in a meaningful way for observation and manipulation.
Operating System
An operating system is a collection of things that don't fit into a language. There shouldn't be one.
Natural Selection
Languages and systems that are of sound design will persist, to be supplanted only by better ones.

Compare to Some Words of Advice on Language Design.

other mainstream-ish OOP languages out there that do not provide

Objective-C 1.0 does not, Objective-C 2.0+ has it as optional except on iPhone.

[sorry, didn't see earlier comments]

Delphi.

Delphi.

A "Visionary" department is

A "Visionary" department is a good idea, I must say.

OO but unmanaged memory.

I think Objective-C has manual memory management.

A compiler option can enable

A compiler option can enable automatic memory management on OSX. It's still manual on iOS, however.

Operating systems shouldn't be

Here Daniel Ingalls sounds like Chuck Moore.

I don't think that this principle is standing the test of time well, although maybe it's a projection of another idea, that the machine should be cleanly exposed to the language.

re: Operating systems shouldn't be

Here Daniel Ingalls sounds like Chuck Moore.

Somewhat, although Moore had a very different notion of what a language should be like and what it should be like to use a computer. For example, Smalltalk had files and directories. Moore might say "why bother? just use raw blocks."

So, speaking of the Ingalls version:

I don't think that this principle is standing the test of time well, [....]

I don't think that the principle has yet been tested by time, but that it is now starting to be, and its prospects are good:

Recall that Smalltalk, "back then", was horribly expensive. According to Wikipedia, a Xerox Star would run you $16,000. At that same time, you could buy an Apple ][, TRS-80, or CP/M machine for perhaps $1,500. In 1984, Macintoshes started at $2,495.

The price of hardware, more than anything, forced the need for a conventional operating system and repelled pure language based systems. Smalltalk could actually run on early Macs although the word "run" is misleading: "crawl along painfully slowly" might be more like it.

Similarly, bang for buck-wise, early unix workstations soundly beat late lisp machines. It started to become apparent that lisp performance could be somewhat OK (for some purposes) on "stock hardware", especially if it isn't charged with replacing the OS, and again: hardware constraints pushed a C kernel underneath lisp.

Fast forward two decades and the hardware constraints have faded. High level languages still tend to be slow compared to low level languages (in the the general case) but, measured in bang for buck, the difference is increasingly unimportant. It's only over these past 20 years that programmer labor costs have come to so dominate relative to a few extra bucks for a machine fast enough for various high level languages. It's only recently the language-as-OS-replacement starts to look economically possible.

Is it plausible, though? Yes, look around. Several factors suggest it is. The common theme is that, these days, paradoxically, high-level-language-as-OS-replacement can reduce hardware costs. Examples:

1) We're not far from a circumstance wherein a personal computer can get by running absolutely nothing at all other than a web browser that supports javascript. Such a machine does not obviously need hardware enforced process isolation. Its approach to virtual memory can be quite a bit simpler than what conventional machines use.

2) Within the emerging market for commodity computing, HLLs (e.g., Java) have a strong niche. Self-contained platforms built around such HLLs reduce the labor costs of using commodity computing and afford suppliers with much more freedom to use non-standard hardware. Once again, CPUs and memory hardware can be simplified if all they need to do is, say, run lots of Java threads.

3) In the mobile market: gee, it sure is cute to have Linux running on your phone and such but if it could extend battery life while lowering component costs would you consider a Javascript-only phone?

4) Similarly to commodity computing, consider "cloud" or service-oriented computing. E.g., huge server farms offering "elastic" RDBMS. If the interface to such systems is a self-contained SQL universe you not only make it possible to ditch the generality of the underlying OS (and the hardware it needs) but make it easier to fine tune performance.

None of the above examples would help were it not for a final factor:

Chip design and fabrication becomes exponentially cheaper over time, for now.

I don't mean that it becomes cheaper to use the latest processes or that it becomes cheaper to, say, sustain the intel x86 architecture.

I do mean that, for example, if you decided to go for a very fast 10 core 68000-ish chip, that would not be unaffordable: not to a Google or an Oracle or such.

In summary, we are only just now beginning to enter a time in history when it is realistic to test the notion that "Operating systems shouldn't be".

Assuming civilization doesn't soon collapse - my betting money is that Ingalls was right.

The Tortoise and The Hare

I agree with the possibilties, that's why I was initially excited about the Fox project. And I'm far from sure you're wrong: the test of time is an ongoing thing.

But there are two animals in this race, whose performance decide if an OS is needed or not: besides the evolution of the programming language, we also need to consider the evolution of hardware. And it seems to me that the progress you describe means that software has caught up to where hardware was in the late 1970s.

Which high-level programming languages have good support for memory management? Do they do a good job of managing the cache? In fact, which microprocessors trust the machine code to be competently put together enough to listen to what it says about memory lifetime? Writing in C/C++, paying careful attention to issues of locality and scheduling of malloc() seems to be the best approach to these kinds of issue that we have. Maybe BitC will change all this... it's not clear that it will offer something that fits what we usually mean by "high-level", but if it does, then maybe we'll be able to say that PLs have caught up with where hardware was in the late 1980s.

Of course, with cheap chip design and fabrication, affordable custom retro hardware becomes an option. Then the 2010s can be the 1970s all over again, and we can roll our own OSless bliss.

Languages friendly to memory and caching

Which high-level programming languages have good support for memory management? Do they do a good job of managing the cache?

Yours would be fairer questions if you first give high-level programming language implementations a decade of full access to TLB, page protections (such as copy-on-write barriers), paging, and interrupts possessed by modern kernel implementations.

As the situation exists today, implementations for many 'high-level' programming languages operate in competition with the kernel. Garbage Collection without Paging demonstrated a 218x improvement in latencies and a 41x improvement in throughput by integration of GC and the kernel's paging system (requiring a patch to the Linux kernel). Intelligent, rational people often balk at the notion of a high-level language-as-an-OS because GC, in their experience, introduces so many performance issues - never mind that GC has never had a fair, portable opportunity to leverage the hardware features because the OS kernel is hogging them!

(GC without paging is also a fine basis for GC with orthogonal persistence, though we'd need the full 8 octets per reference - with 12 tag&GC bits per 52-bit reference reference to 16-octet words scales to 64 petabytes per volume, and also allows 'small integers' and doubles be stored locally - which is good for the cache.)

This isn't to say we wouldn't benefit from more 'cache-friendly' language constructs and concurrency models.

One issue with OOP, for example, is that 'new' objects are rarely allocated in the most cache-friendly of configurations, not even for the extremely common pattern of static-after-construction-and-injection configurations of objects. If we described and constructed 'new' object-configurations (with predefined, potentially cyclic relationships between objects), then a compiler could provide these as a cache-optimized slab of objects and specialize based on the dynamic object types in the static configuration.

But OOP would, even with that adjustment, not be the most cache-friendly of paradigms: objects help organize explicit mutable state, but do very little to avoid it, and mutable state [unless used in a disciplined manner] is not 'cache-friendly', especially in a multi-core environment. Oddly, much of this "explicit mutable state" that causes objects to be cache-unfriendly is the objects (and developers) painstakingly maintaining their own 'caches' - as of responses to asynchronous queries, and calculations performed upon these. A functional-reactive style for maintaining the query and calculations would have been far more cache-friendly, but not easily composable (the 'object' interface interferes with composable queries). A far more cache-friendly paradigm would simply expose methods as reactive-interactive behaviors. ([edit] Due to some confusion at reddit, I'll note: I'm not suggesting we avoid mutable state entirely. But I would like to see us eliminate explicit use of non-essential state, such as where state is redundant - could be regenerated.)

Cache-friendly programming - albeit, in the context of distributed object-capability programming with concerns about network latency, disruption, replication, partitioning, and resilience - was near the top among the motivations that led me to design the RDP paradigm... though I had other stumbling blocks, motivations, and inspirations on the way.

Besides developing more cache-friendly paradigms, we can improve the implementations. Continuation passing styles favored for higher-level languages, for example, isn't as "trivially" cache-friendly as stack-based programming. That doesn't mean it cannot be cache-friendly, but to make it work we need to target another implementation property: Garbage Collection.

We need some 'truly' concurrent collection, in the sense of multiple cores performing collections in parallel. The high degree of concurrency by integration with the page-memory and context protections oft used between an OS kernel and the process - i.e. write protections to interrupt a core that is about to update a page that another core is collecting. We give each core its own, dedicated nursery and first-generation (copy collection) pages of 16 to 64 kB each (depending on the core's L2 cache size). Each core is responsible for collecting its own nursery in a loop: pages A,B,C if you reach the end of page C, you collect page A. (The loop works due to CPS rather than stack-based computation.) All the nursery pages should fit into the L2 cache, with room to spare for code and external data. Most collections are just on the Nursery generation, but one occasionally sweeps or copy-collects the 1st generation when obtaining new 1st-generation pages. If a reference is to be created from outside the nursery into the nursery, an immediate collection is performed instead, moving the target data into the first-generation space. This simplifies collection of nurseries (since only nursery pages and the core contain a reference into its nursery), ensures that nurseries are 'volatile' with respect to orthogonal persistence and inaccessible to other cores.

Stack-based programming isn't friendlier to caches than per-core nurseries rotating in a core's L2 cache. Since the nurseries are copy-collected, allocation is simply incrementing the nursery pointer and checking whether it is still within the nursery page. That extra check is local to core registers, with a cost well masked by branch-prediction.

As an added benefit, CPS is much friendlier to garbage-collection, to recursion, and to fine-grained concurrency. Each 'thread' is a continuation, idling in a work-stealing queue. And logically "preemptive" multi-threading can also be incredibly more efficient and sane for language-as-an-OS: the compiler-in-the-OS can easily inject yield points (or perhaps 'countdown-to-yield' points) at relatively sane locations rather than depending upon expensive clock-based interrupts at arbitrary computation contexts.

Other cache-friendly properties for languages:

  • Copy-collection is very cache-friendly. Locality of reference can actually improve over time.
  • We would benefit from favoring 'ropes' rather than arrays or lists. Ropes can greatly improve locality, but can be 'chunked' at sizes that don't bugger the concurrent garbage collectors. There is rarely such a thing as O(1) access to a large array anyway: the cost of an array is determined by the virtual-memory or paging organization, which is typically O(lg(N)). Ropes atop finger-trees would offer O(1) amortized access and update to either edge. Ropes can also be useful for 'split and combine' data parallelism and sorts, since a split and combine can be O(lg(N)). Actual semantics for ropes could be based off of lists (similar to how semantics for natural numbers are oft presented in Peano arithmetic) - in which case developers are told to "use the built-in sorts and folds for best performance".
  • Similar to lists-as-ropes, it would be nice to have first-class-sets and have these represent under-the-hood via kd-trees indexed on flexible dimensions once they breech a certain size. This achieves similar locality properties for complex volume queries (quickly isolating then filtering smaller volumes, as opposed to scattered data pointers all over), doesn't require an ordinal property for the elements, and can leverage features of elements that are simply painful to install by hand.

Anyhow, while I'll agree that "writing in C/C++, paying careful attention to issues of locality and scheduling of malloc() seems to be the best approach to these kinds of issue that we have", I believe this is because higher-level languages expend a lot of resources doing battle with the C/C++ based kernel. If our kernel was written in some higher-level language, perhaps we'd all be looking at programs written in C/C++ and discussing how ineffective they are for caching, efficient memory, and concurrency (and egads! look at those horrendous 'imperative' adaptor APIs to the reactive kernel!)

If we were to look at languages and decide which uses the cache most effectively at all, I suspect our answer wouldn't be C/C++. It'd be SQL.

They've had it

Yours would be fairer questions if you first give high-level programming language implementations a decade of full access to TLB, page protections (such as copy-on-write barriers), paging, and interrupts possessed by modern kernel implementations.

Actually, they've had it. The JavaOS efforts, for example, started in 1989 and 1990 on microkernels that fully exposed these attributes. While it's true that L4 remains a microkernel in C, the overhead of L4 on such operations is low enough to be irrelevant for purposes of testing this proposition.

What's unfortunate is that the experiments never got enough energy behind them to continue, and that they were conducted primarily by OS people rather than PL people. This may have something to do with the fact that academic PL researchers can't get tenure (or, indeed, publication) for "engineering" results...

High Level Programming Language Implementations

One point about language-as-an-OS is that it doesn't necessarily require that the kernel be written in the same language as the language it hosts. Many Java implementations are written in C. (Similarly, writing a kernel in a high-level language does not imply that the language provides something better than the Unix API.) [edit: though, as Thomas notes, the more kernel written in the language the more flexibility you'll have.]

The essential property of language-as-an-OS is, to my understanding, the fact that applications are written as regular configurations of objects in a language, talk to one another using the language's datatypes, etc.

I'm curious how much of JavaOS was written in Java, and the extent to which Java was the application language.

Several experiments

I'm curious how much of JavaOS was written in Java, and the extent to which Java was the application language.

There were several Java/OS attempts that explored different divisions between native and managed code. To my knowledge, all of them used Java exclusively for application-level code, except where that application code provides tightly-integrated backing support to a microkernel.

There was an implementation of Kaffe on L4 that used L4 for context switching and memory mapping and did everything else in Java. Another one was built on OSKit. There was a bare-metal Java OS attempt as well, but I don't now recall the name.

On reflection, much of the demand for so-called "native code" had to do with meeting latency requirements. The PL community, having asserted for decades that temporal properties are non-semantic and therefore have no part in language design, has (no surprise) produced languages that have no capacity to manage temporal requirements. There are exceptions to that, but as a broad generalization it's fair.

The closest I have seen is probably Midori, about which I really can't say very much. I can probably get away with noting the obvious point that C#'s support for unboxing is very limited, and that any approach that wants to use C# for low-level implementation is going to have to cheat on that point somehow.

My general observation is that anything using a migrating collector is very hard to integrate sensibly with DMA. Beyond that, operating system kernels tend to require relective understanding of their own memory maps, tend to have fairly tight real-time response requirements, and tend to require non-cooperative preemption schemes. This does not preclude the use of safe languages or GC within a supervisor, but it certainly means that the core parts of the supervisor must not be perturbed by GC.

This memory discipline division tends to render moot the hypothesis that language-based protection may obviate hardware-based protection: a firm layering of GC and non-GC runtimes is required, and a copy boundary tends to be mandated where two heaps cannot sufficiently collaborate on collection policies. Beyond this, current hardware requires that certain operations be performed in supervisor mode, but don't handle interrupts sensibly when applications run in supervisor mode.

All of this conspires to make experiments "challenging."

Temporal Semantics and Latency Requirements

much of the demand for so-called "native code" had to do with meeting latency requirements. The PL community, having asserted for decades that temporal properties are non-semantic and therefore have no part in language design, has (no surprise) produced languages that have no capacity to manage temporal requirements. There are exceptions to that, but as a broad generalization it's fair.

I totally agree! Computing needs time, at least if we want to control it.

I was among said naysayers even just four months ago, believing that a good implementation can (mostly) take care of itself. I guess part of my skepticism was due to the rather poor efficiency and scalability of many temporal logic implementations (update-world-every-tick doesn't scale to an MMORPG). And I was also skeptical of secure composition (e.g. if independent vats attempt to synchronize logical time, then any vat can maliciously interfere with progress of other vats).

I started favoring temporal semantics when I was struggling to control physical latencies for deep composition in RDP. Supporting a large degree of indirection is important to keeping the language simple, composable, productive. But, while running through user-stories, I felt a lot of pressure to cut out middle men, avoid indirection, avoid delegation, and generally make my own life miserable and complicated in order to more effectively control latencies. Every indirection was adding latency - which, in some domains, is more important than developer-side complexity. Worse, every indirection was adding an unpredictable amount of latency based upon scheduling decisions, physical timing, and effectiveness of other optimizations - cache, memoization, lazy evaluation. Unpredictable latency causes indeterministic race-conditions, hinders verification, made me feel bad about the fake products I was imagining in my user stories.

Temporal semantics are the best solution I was able to discover in the literature. Operations happen on logical time, which keeps the system verifiable. Logical time can be directly correlated to physical time, and logical latencies will mask physical latencies. Subgraphs of larger applications may even compose with 'zero' logical latency - so long as enough slack is provided at the membranes that the CPU can 'catch up' and bound the resulting time-leak. Without so much effort, temporal semantics allow users to verify systems, control latencies between subsystems, and logically eliminate latencies for composition within subsystems.

This was... very nice. But I also gained a lot of side-benefits that I wasn't expecting. Temporal semantics allow me to express streaming data and streaming events - and synchronize then - without explicit state. This is an enormous expressiveness advantage for developing multimedia systems, UI, robotic control. Most accidental complexity, most bugs, are related to state management... so avoiding state in such a broad array of domains is a very good thing. Temporal logic actually reduces latencies, improves efficiencies. Rather than a periodic update of latitude/longitude at 10Hz, I can build a simple model that predicts a path along a simple arc and update whenever every 10 seconds or so when the prediction becomes invalid - thereby saving 2 orders of magnitude in cascading update costs, and greatly improving system-wide interpolation. Heck, by peeking a little into the "anticipated" future, one may trade accuracy in order to have negative latencies. That's a darn sweet engineering trade-off for, say, planning robotic motion in a combat scenario.

Temporal semantics are pretty much all positive.

I find it sad that it took so many user-stories, and so many years, to convince myself to give temporal semantics another serious try. I now truly appreciate Conal's Push-Pull FRP and Reactive library, or Dedalus from Berkeley. Yet I suspect most of the PL community is where I was, with more skepticism and less appreciation of temporal semantics than temporal semantics properly deserve. Reactive and Dedalus aren't quite what I want for robust composition, parallelism, and secureable distributed applications... but they are very elegant at what they do.

Implementation is critical, however, if temporal semantics are to ever gain acceptance. I was soured on the idea back in university due to experience with poor implementations. We need mechanisms that control latency and consistency without sacrificing much throughput, efficiency, cache coherency, parallelism. My current pursuit is vat-flavored: a vat is a thread with an event-loop and a logical time, each deterministically maintaining a set of agents (often with 'zero' logical latency). Different vats communicate only at instant boundaries, providing a basis for glitch-free inter-vat consistency. Vats within a process are allowed to drift by a developer-controlled amount - such as 300 microseconds. Since there is a fixed upper-limit to temporal drift, one can achieve inter-vat determinism... e.g. by reading at T-300 and writing at T+300 between vats for a total logical latency of twice the drift (600 microseconds). Alternatively, one may make an engineering tradeoff, reducing inter-vat consistency to improve logical latencies (i.e. assuming that most vats, most of the time, aren't at maximum drift). My performance hypothesis is that allowing this logical drift will reduce synchronization overheads relative to lockstep approaches, increase efficiency for parallelism, and likely improve cache-coherency. I'm not yet to empirical trials, though.

Anyhow, I strongly agree that achieving latency requirements is very different from supporting efficiency requirements. The two are tightly correlated only when the relevant CPUs are running hot most of the time. We could certainly get by with a somewhat less efficient OS if we had better support for controlling latencies end-to-end (including application composition). What makes an OS "feel" fast is a small, predictable latency - not efficiency. Further, I believe that well-controlled engineering tradeoffs allow us easily to have both!

Of course, this control certainly requires eliminating the hideous, implicitly timed, imperative, hierarchical calling structure (call and wait) models of programming seen in most imperative and OO languages.

My general observation is that anything using a migrating collector is very hard to integrate sensibly with DMA.

The easy answer is: don't try. DMA pages just wouldn't be collected - that's easy enough to mark in the page header. Similarly, pages representing ROM resources would not be touched by a collector.

I actually had designed features for that sort of contengency built into the collector I was describing to Sandro earlier. One could simply mark pages as untouchable (one bit in the page header), and the collector would forever respect that. My motivation, at the time, was to support 'applications' as a frozen images - which would then be heavily optimized by offline memory-ordering, compilation, and garbage collection tools. Since I had a page-level collector, new image data would only append to the set of frozen pages in the file.

I'm not pursuing that approach to applications any more, but I think anyone developing a collector for the kernel would quickly come up with a solution similar to my own: if collecting something is problematic, but the problem is finite and controllable by other means, then simply don't bother.

operating system kernels tend to require relective understanding of their own memory maps, tend to have fairly tight real-time response requirements, and tend to require non-cooperative preemption schemes.

Agreed.

This does not preclude the use of safe languages or GC within a supervisor, but it certainly means that the core parts of the supervisor must not be perturbed by GC. This memory discipline division tends to render moot the hypothesis that language-based protection may obviate hardware-based protection: a firm layering of GC and non-GC runtimes is required, and a copy boundary tends to be mandated where two heaps cannot sufficiently collaborate on collection policies.

We do need memory discipline, but that doesn't require we jump straight to a particular solution such as "a firm layering of GC and non-GC runtimes". I'd like to avoid any 'layering' at all - perhaps other than Xen or VMWare to improve acceptance and cloud portability. And I believe it possible.

The relevant point here is that we need, for certain regions, to control interference from collectors outside local control.

A solution I took pursued for that same point (and described to Sandro earlier) is true concurrent collection. And by "true concurrent collection" I don't mean one core is collecting while the others do their thing. I mean multiple cores performing collections, in parallel, for different regions of memory. Each core has its own nursery and first-generation pages, its own collector to handle them (for cache coherency), its own work queue and scheduler (with work-stealing). Certain pages are unshared and off-limits to outside collectors (including nursery pages) and are, in fact, protected by a copy barrier (would collect into first-generation before sharing data between cores). Ownership of first-generation pages could also be changed, when necessary.

I'm not saying that solution was a better one, but it certainly was a different one. I'd like to keep looking for different solutions until we find one that doesn't require layering, and that achieves the other goals - real-time properties, preemption, et cetera. I do believe that we could do much better for real-time latencies than anything modern kernels achieve.

I believe much of the problem with attempts at an OS based on higher-level languages such as Java is the "such as Java" part. C# and Java are simply bad languages for... well, I can't think of a good way to limit that statement. I don't have much respect for OOP as a paradigm for organizing computation, nor for message-passing as a concurrency model, and especially not for the synchronous abomination that is message-passing in OOP.

There are sane and reasonable alternatives. Why not, for example, an OS based on Lustre? It has already proven itself for embedded systems.

Beyond this, current hardware requires that certain operations be performed in supervisor mode, but don't handle interrupts sensibly when applications run in supervisor mode.

I smell an interesting story. Care to share?

Ropes the Ultimate

A lot of response. It sounds like you agree with me about where we are, but are more optimistic that there's a path out.

We would benefit from favoring 'ropes' rather than arrays or lists. Ropes can greatly improve locality, but can be 'chunked' at sizes that don't bugger the concurrent garbage collectors

This is a very interesting idea for me; it fits with thoughts I've had about cyclic sharing for the λ-calculus, which leads to interest in what we might call higher-order rope. Has what you say about the benefits of ropes been developed anywhere?

Has what you say about the

Has what you say about the benefits of ropes been developed anywhere?

I used ropes for lists to improve cache-locality and simplify page-based GC in my earlier (actors plus reactive expressions based) implementation of my Awelon language. This was inspired by seeing 'binary words used to represent Peano naturals' in plenty of existing languages. My development of RDP since April has caused me to scrap my older work (developed painstakingly in C++) and switch to Haskell where I can better model a new reactive paradigm (independently of a new language).

Haskell has plenty of finger-tree rope libraries but mostly for characters only (e.g. trees of bytestring chunks), and does significantly improve locality for them, but does not use them in the manner I suggest (OTOH, Haskell still has a stop-the-world collector). I do not know of anyone else who has tried.

By the way...

I obviously suspect that the availability of these finger tree libraries in Haskell just isn't at all the main thing that makes the language especially appealing to you, but speaking of which (& you know, "just in case" as they say... ;) -- XSLT Guru, and unsurprisingly strong proponent of the "FP does indeed matter" idea -- Dimitre Novatchev did a very nice, general-purpose-in-mind implementation in C# of this Swiss Army Knife of Data Structures.
[Edit: & in the hope I'm not totally off-topic... hmm]

benefits of ropes if developed

Charles Stewart: Has what you say about the benefits of ropes been developed anywhere?

I plan to do so, and I played with them in the past, but have nothing publically visible now. (I had to search for "rope" on wikipedia for a computer science sense to see it if matched my own idea developed independently a few years ago.) Apologies if I seem verbose in what follows. This is all a long vote in favor of "ropes are good and important."

I work in embedded networking contexts with real time requirements, where one cannot touch too much memory at once without missing a deadline to return control to network polling. So for example, any large contiguous data structure I create must be initialized incrementally. (I do the kind of caching that needs resources at a problematic scale.)

Generally speaking, one must deal with two conflicting requirements: 1) large data sets are needed to scale, and 2) contiguous and monolithic sections of address space are problematic and effectively forbidden except for special cases which cause provisioning hoops to jump through at boot time. Thus scatter/gather is the canonical technique for dynamic runtime demands for variable-sized data which might get large.

A few years ago I decided any future language runtime I create will use scatter/gather for all strings and arrays, using a btree for scale with leaves consisting of contiguous subsections. This is the same as a rope, but my original model was files in the Exodus storage system, where leaves are persistent block nodes. Because I had already worked out how to do copy-on-write incremental update of Exodus style btree files, I had planned to present apparently immutable strings and vectors that could actually be edited by copy-on-write that merely patches any part a changed tree, for efficient sharing that doesn't require changing now immutable memory.

My main design motivation for strings and vectors in rope format was a desire to avoid ever allocating any contiguous section of address space bigger than a size fitting inside a garbage collection heap node -- say about 64K in size. For example, you might impose an upper limit of 16K per contiguous string leaf node, but combine many of these in different heap nodes to create logical strings of arbitrary size using a btree; even when "immutable" such strings can be edited by patching out any leaf node changed by an edit, along with any inner tree nodes on the edit path.

This discipline -- fitting all leaf nodes inside heap nodes of limited size -- aimed to make incremental gc simpler by breaking applications into several lightweight processes of disjoint lightweight address spaces, each composed of some set of heap nodes, which could be collected independently (typically via copy-collection for simplicity). So latency to gc one lightweight process could be decreased by limiting lightweight process address space size.

In "real" applications I would put most state out in the C world, in buffers managed by flow control, but referenced by relatively small lightweight processes in a high-level language, to avoid needing to collect the bulk of memory consumed by actual data traffic. So gc would mainly affect state controlling executable structures like continuations, and not the large bulky data sets being manipulated.

I'm trying to say I think ropes are strictly required somewhere to meet performance requirements, including real-time latency issues, when there's a lot of state processed incrementally and asynchronously. But I'm only likely to implement this in a language intended to be used as a simulation test harness pretending to be the entire network environment in which embedded components must operate correctly. Such a simulation environment must be as efficient as possible for stress tests to be meaningful. Some bugs cannot be found unless concurrent things happen fast enough.

History of Ropes

I plan to do so, and I played with them in the past, but have nothing publically visible now. (I had to search for "rope" on wikipedia for a computer science sense to see it if matched my own idea developed independently a few years ago.) Apologies if I seem verbose in what follows. This is all a long vote in favor of "ropes are good and important."

The Wikipedia article is hugely inaccurate. Ropes were first developed by Butler Lampson and his band of merry hackers at Xerox PARC, as part of the lightweight API for Cedar Graphics. Cedar Graphics can be thought of as the predecessor to PostScript. Boehm, Atkinson and Plass did not introduce the concept. You can find references to "Cedar ropes" via Google, which shows that they were used in other applications besides networkable printing, such as networkable voice/over. The innovation of Boehm, Atkinson and Plass was to provide a different object representation for Ropes to better approximate the needs of string manipulation in most applications... They also argue that their contribution is the first explanation of how it can be implemented:

The idea of representing a string as a tree of concatenations has been rediscovered multiple times, particularly in the context of attribute grammars (cf. Reference 5 or Rodney Farrow’s Linguist system). Here a fast concatenation operation is often sufficient. We do not know of other published generalizations to a comprehensive collection of string operations.

Cedar Ropes themselves have evolved since 1982. They are mentioned in passing, but not described, in References 19 and 20. References 21 and 22 each contain a one paragraph description of ropes, with no implementation details (some of which have changed since then). Reference 21 also discusses some applications of ropes.

Variations of ropes have also appeared in DEC SRC programming environments. Modula-2+ provided both an interface and implementation similar to Cedar ropes. Modula 323 provides the interfaces, but the current implementation uses a flat representation. (Though we have not done a thorough study, this choice appears to at least double the heap space requirements of one of the Modula-3 programs we did examine.)

Why doesn't Cedar explain the implementation of Ropes? Because it is the interface that mattered:

An instance of a ROPE is an immutable collectible sequence of characters. Because ropes are immutable, the sequence of characters denoted by a given rope never changes; that is, every operation creates a new rope. As a result, ropes can be safely shared between programs without concern for storage ownership or synchronization. The rope interface provides a large set of useful operations on ropes, including rope concatenation, rope comparison, subrope extraction, and rope scanning. For efficiency, the standard implementation of ROPE represents a rope as a directed acyclic graph with heuristics that attempt to limit its depth. Therefore most rope concatenation and subrope extraction operations can be performed by manipulating small numbers of these structures rather than copying large numbers of characters. A client can provide a customized implementation of ROPE by implementing a small set of basic operations on the new representa-tion. The combination of convenience and efficiency of Cedar ropes have led to their widespread use at all levels of the system.

A Structural View of the Cedar Programming Environment. D. C. Swinehart, P. T. Zellweger, R. J. Beach, and R. B. Hagmann. (1986).

Hot Button: By Boehm, Atkinson and Plass arguing for a more concrete implementation of Ropes, they missed the spirit entirely.

Interesting Work

So you also used ropes to simplify incremental GC, allocation, and latency issues. Thanks for sharing your story.

Has what you say about the

Has what you say about the benefits of ropes been developed anywhere?

I've seen ropes implemented on top of finger trees, as David Barbour suggested: Programming Finger Trees in Coq See the second reference on that page, Programming Finger Trees in Coq, for Sozeau's ICFP'07 paper.

Sozeau gives David's claims. It is likely David saw this paper or one like it and forgot where he derived the guarantees from.

Haskell's Data.ByteString.Sequence library and Sozeau's Coq work both view ropes as a specialization of finger trees.

I've mentioned ropes on LtU many times before in the context of optimizing buffer presentation for text editors (such as the ones in free-form textual IDEs).

Ropes (in the regular B-tree

Ropes (in the regular B-tree format) I heard about way back as a in high school - some fourteen years ago - where I implemented as a for-credit project once in C and once in Pascal. The locality properties of ropes are pretty darn obvious and intuitive to anyone who has ever implemented them or groks their implementation, and also understands a little about CPU caches, virtual memory, and Amdahl's Law. It isn't as though I need a paper to tell me such obvious things. Ropes are older than I am [edit: from your earlier post Cedar ropes are referenced as 1982. If they're the first, they're just a little younger than I am.]

Now, finger-trees I didn't hear about until recently, a few months after joining LtU. They're pretty darn cool. Next time I do ropes, I'm using finger trees.

That said, I took "what you say about the benefits of ropes" as referring to its usage in keeping small chunk-sizes that won't bugger the GC. To the best of my knowledge, that has not been developed anywhere else. [Correction: the best of my knowledge now says it has been independently developed elsewhere by individuals such as McCusker.]

Also improve Unicode support

The ropes approach is pragmatically necessary for any efficient Unicode string implementation. It eliminates the need to choose a single code point size within the internal representation of the string.

I haven't double checked, but I think the ICU components are implemented in ropes-like form. From memory, they did it both for substring representation sharing, but also because two major XML standards disagreed about the character content indexing rules in such a way that indexing by both code points and ucs2 units in near-linear-time was required. From memory, their strands have homogeneous code point representation, so that have O(log n) indexing operations, but O(1) next/previous behavior.

Doing it well definitely puts pressure on the optimizer.

40 years later, there is still much to be learned from Cedar/Mesa. Does anybody know if detailed language specifications for those were ever written? If so, are they available on the web somewhere?

Butler Lampson's page as good as any

Mesa (1972-79): With Jim Mitchell, Chuck Geschke and Ed Satterthwaite, I designed this programming language [13, 23; Geschke et al., Early experience with Mesa, Comm. ACM 20, 8 (Aug. 1977), pp 540-553; Mitchell et al., Mesa Language Manual, Technical Report CSL-79-3, Xerox PARC, 1979 (23a)]. It is based on Pascal, but has unified facilities for coroutines and parallel processes, and for specifying interfaces among many modules in a large system. I designed much of this.

23a points to:

Mesa Language Manual

Cedar (1979-82): With Jim Horning, Paul Rovner and others, I designed the extensions which produced Cedar by adding a safe subset, automatic storage deallocation, and runtime types to Mesa. I wrote a definition of Cedar semantics in terms of a much simpler kernel language, and a language manual [32a]. See Cedar nucleus.

32a points to:

A description of the Cedar language.

GC without paging is also a

GC without paging is also a fine basis for GC with orthogonal persistence, though we'd need the full 8 octets per reference - with 16 tag&GC bits per 48-bit reference reference to 16-octet words scales to 4 petabytes per volume (more if you use larger chunks and special regions for large binary structures), and also allows 'small integers' and doubles be stored locally - which is good for the cache.

I think region-based memory management shows the most promise here. The storage analysis aggregates the cost of these pickled references over an entire memory region rather than individual references.

Region-based GC

I disagree, at least if you mean region-based management as the primary mechanism.

Fact is, relative to paging, a mark-and-sweep-and-bookmark costs almost nothing. Further, by performing a mark-and-sweep-and-bookmark, you can eliminate pointers that point away from a region, thus improving opportunity to collect the other pages later. Further, copy-collection is still important for locality and caching. I did support collecting whole pages based page-to-page reference tracking in the headers (since that was an uber-common-case optimization, and cheap!).

But my implementation would have been inefficient - losing too many opportunities for collection - without also performing the regular intra-page collections. And, while this may be pride speaking, I don't really see how to do much better than my implementation on that particular aspect.

For any given page I wanted the ability to quickly narrow down which other pages might be pointing at it. This was a primary goal, to support incremental copy-collections. I don't need perfection, but if I assume ten-thousand pages (625 MB persistent image at 64kB pages) I'd like to quickly narrow it down to the twenty or thirty that might contain a reference to the page I wish to copy-collect. Copy-collection was itself useful for keeping the image small... e.g. I could copy-collect a page from the end of the file to spaces in the middle.

I also wanted cheap allocation - cheaper than a list-walking malloc() typically gives me, i.e. a simple pointer increment. This was important since my languages are mostly pure-functional, so the most important operations to optimize are allocation, initial construction, and GC. Very little data is edited in place, so I allow that to be more expensive. In pursuit of this, only garbage-collection uses the holes opened by mark+sweep, and does so by occasionally filling the holes with copy-collected data. Regular memory allocation, whether in a nursery or (for ropes and other compiler-chosen or developer-annotated points) in 1st-gen memory, is all 'incrementing a register' into a page.

The page-headers were simply kept in a binary rope, which is extended and shrunk, as necessary, in the normal geometric fashion. Each page-header was fixed-size. Since page-headers were not actually part of the pages, I had much better locality for working with them (cf. the paging without collection guys needed to keep one in four pages in memory just for the headers). Because headers are fixed-size, I could not keep a precise list of which pages refer to other pages. But what I could do is keep an imprecise list: I kept 28 8-octet slots per header (plus four more for other data) that described which other pages reference this one. Overall, the 256 octets overhead per 64 kB page is ~0.4% overheads, which isn't horrible. With 28 slots, I was precise for up to 28 inter-page references. At 29 references, however, I'd need to combine slots: each slot had quite a few bits for meta-data other than identifying a particular page, so some of that meta-data was used for tracking 'ranges' of consecutive pages. E.g. if four or five pages referencing another are consecutive (and they very often are) then I only need one slot for the bunch of them. But if I ran out of slots, I'd need to combine slots and include intermediate pages that don't actually reference the target. I could choose these slot-merges so as to minimize the number of extra pages included.

This allows scaling suitable for interned values. E.g. a page of interned symbols is likely to be nigh-saturated with external references, but it still takes only 224 octets to carry them all. It also scales down nicely: common pages, as for 'big data' (large ropes, especially binary ropes), don't have very complex inter-page relationships. Most of the time, 28 groups of consecutive pages is far more than needed to precisely indicate which pages reference one another. (Indeed, I was even wavering between 12 slots and 28, but decided on the larger number.)

GC then uses these page-headers and saturation values, along with other useful information (copy-collection generation, binary pages, how many other pages would need paged-in) to heuristically choose a "set" of pages to collect (past the nursery generation, which is special). Within each set, some pages are subjected to mark+sweep and others to copy-collection - with the copy-collection filling the gaps in the mark+sweep pages. Collection on non-overlapping page-sets can happen concurrently. I did not have GC-without-paging, but rather GC-very-aware-of-paging - incremental, concurrent, and not-quite-real-time (since I couldn't guarantee GC threads against OS preemption). When paging did need to occur, I could start it asynchronously and process other pages first.

Pages copy-collected are returned to the pool of empty pages and may be removed from the referenced page-headers (possibly splitting a ranged slot). Any pages that as a result have no incoming references may also be collected, at the whole-page level, without even paging them in. This is where the region-based management really shines - indeed, collector heuristics can even aim to free regions in this manner, by targeting the pages that reference the regions.

I eventually puked all over this work when trying to add STM support (in addition to GC and orthogonal persistence) back when I was thinking that Transactional Actors would be the next big thing. That was more than I could chew. My implementation basically "modeled" various hardware constructs - including TLB and paging, and even using thread-local storage to keep extra 'registers' - in order to give me the control I needed to control paging and ensure consistency-on-disk. That layer hurt performance a good deal, so while performance was acceptable it wasn't great. I was fighting with the OS, and losing. I think that if I try this again, I'll be targeting Xen or similar - perhaps integrating with a pared down BSD kernel (BSD since Linux is GPL, which I dislike). I'll also skip on STM, since RDP doesn't need it (using glitch-free vats instead).

Anyhow, my point is that I never did figure out how to get region-based management to work efficiently on its own, especially if I wish to keep expressiveness. I consider it to be an optimization or opportunity that can be leveraged when recognized. I think the main issue comes down to scalability and efficiency and complexity concerns when keeping precise inter-page references. Reference counts don't work, even for pure functional, because lazy evaluation is a form of mutation and can produce cyclic page references - and, really, most languages will have at least some mutation. On a smaller-scale, reference counts have terrible cache-coherency and efficiency and atomic-update issues, no matter where you store them. (IIRC, Thomas Lord has oft expressed interest in languages where all-things-are-trees, on the basis of improved garbage-collection, presumably via reference counting. I don't consider that a wise path at all. GC is potentially the best tool we have whether we're measuring allocation, release, cache coherency, concurrency, or real-time properties.)

The question you should ask your self, I guess, is how efficiently your proposed approach would handle pages or regions saturated with incoming references (the canonical example being a table of interned symbols), and how well it handles common expressiveness issues: state and laziness and data-parallelism that will occasionally cause old regions to point into new ones.

Purity does not require lazy evaluation

Reference counts don't work, even for pure functional, because lazy evaluation is a form of mutation and can produce cyclic page references - and, really, most languages will have at least some mutation.

I will refrain from commenting on cache locality and similar issues, but I'd like to point out that you dismiss reference counts for pure functional languages because a particular evaluation strategy can create cycles in your heap.

Many other reasons, too. If

Many other reasons, too.

If you want call-by-need laziness (implicit or explicit), multi-core data parallelism, memoization, JIT compilation, or linearity optimizations, you'll also have potential heap-page cycles.

Basically, anything that causes you to update an older region with a reference into a newer region would hinder region-level reference counting.

re: "Thomas Lord has oft expressed [....]"

dmbarbour, you wrote:

Reference counts don't work, even for pure functional, because lazy evaluation is a form of mutation and can produce cyclic page references

This is trivially false. An (impractical, if naively done) string-based lambda-calculus interpreter can be both lazy (use normal order evaluation) and avoid circular references.

- and, really, most languages will have at least some mutation.

I think you have a cloudy view of "mutation". Obviously the interpretation of programs in any interesting computer programming language involves physical changes of state. So, I can't tell what you are saying there.


On a smaller-scale, reference counts have terrible cache-coherency and efficiency and atomic-update issues, no matter where you store them.

That rather depends on your hardware architecture and on how you use them. You are making awfully sweeping statements.


(IIRC, Thomas Lord has oft expressed interest in languages where all-things-are-trees,

Sure. Lambda calculus and combinator calculus (neither of which I think are, as they are commonly treated, a great foundation for a programming language) both demonstrate fully general computation over domains of trees. And we shouldn't be so surprised because, in the end, all computation is, by definition, syntactic manipulation and all syntax is, by the laws of physics, a domain of trees.


on the basis of improved garbage-collection, presumably via reference counting. I don't consider that a wise path at all. GC is potentially the best tool we have whether we're measuring allocation, release, cache coherency, concurrency, or real-time properties.)

I have (but in this case failed) to point out that I like the idea of graph tracing ... just not as the default thing. If my program is doing something with data best represented as possibly cyclic, mutating graphs ... I'd like a language with some libraries for that. I don't want to be in the foundation of the language, though.

Reference counts don't work,

Reference counts don't work, even for pure functional, because lazy evaluation is a form of mutation and can produce cyclic page references.

This is trivially false. An (impractical, if naively done) string-based lambda-calculus interpreter can be both lazy (use normal order evaluation) and avoid circular references.

Maybe this is a terminology issue, but lazy evaluation (to me) suggests keeping the result after evaluating it, whereas a normal-order evaluation can generally increase the number of times you evaluate an applicative expression.

[edit] Keep in mind, also, that the context of the above statement was region-level reference counts (e.g. this 64 kB region or page references that 64 kB region or page), not item-level. Many of these issues also apply item-level, but even with an acyclic language one could (with laziness or mutation or memoization) have cycles between generations or pages.

I think you have a cloudy view of "mutation". Obviously the interpretation of programs in any interesting computer programming language involves physical changes of state. So, I can't tell what you are saying there.

Apologies for the confusion. In the context of memory management, I use 'mutation' to mean that memory can change after initialization. The alternative to mutation is to restrict expressiveness to a degree that allocation, initialization, and release cycle of memory are all you need - which means that current 'live' memory state can be traced entirely to a single, global continuation. That hinders parallelism, concurrency, laziness, memoization, and the like - but does simplify memory management globally since all memory is structured in an ordered tree, rather than a graph.

I've seen it expressed far too often - even on LtU - that "pure languages only point to old data" by people enthusiastic to simplify memory management. This can be true, but is difficult to achieve in combination with other nice pure language properties.

On a smaller-scale, reference counts have terrible cache-coherency and efficiency and atomic-update issues, no matter where you store them.

That rather depends on your hardware architecture and on how you use them. You are making awfully sweeping statements.

Granted.

I'll amend that my statement is based upon multi-core shared memory desktop computing architectures in common use today, and that I include the various reference-counting strategies I know: keeping counter with data or in a separate heap-structure or array, and region-based counting (sharing reference counts for a large group of objects) implemented with a region mask or offset in each pointer or object header.

I'm also assuming fine-grained, deeply structured, mostly immutable data - a typical three to eight words per node, many of them pointers. A coarser granularity or fewer references can reduce the relevance of reference counting performance in accordance with Amdahl's law. A high rate of mutation for the regular data would render the cache-coherency issues from reference counting mostly irrelevant.

in the end, all computation is, by definition, syntactic manipulation and all syntax is, by the laws of physics, a domain of trees

I disagree. You could just as easily claim that "all computation is syntactic manipulation and therefore a domain of flat strings". It wouldn't be any more or less true than what you said.

There are no "laws of physics" preventing syntax from being graph structured. Indeed, I believe the introduction of identifiers in text-based syntax is very often to express a graph-structure. A graphical syntax can express these structures more directly, without giving name to intermediate structure.

A common issue is that we introduce artifical names (such as integers or symbols) to describe points in a graph, allowing us to enumerate edges between them. The syntax \y->y+2 means the same graphical structure as \x->x+2, but this is not captured explicitly by our notation. Similarly, a graph {(1 2) (2 3) (3 1)} might mean the same thing as {(a c) (c b) (b a)}.

The failure to capture graph structure is a considerable sources of accidental complexity in most modern languages - i.e. on par with managing trees with flat string manipulations. Basically, we end up injecting and maintaining structure where, semantically, none should exist.

To avoid that complexity, developers are forced to use meta-syntactic mechanisms - such as types and typeclasses - which of course hinders the view that computation is syntactic manipulation. If you want computation to be syntactic, then your syntax needs to precisely capture your semantics.

Similarly, we can destructure our trees a bit further by eliminating order among child nodes - i.e. by using sets.

I like the idea of graph tracing ... just not as the default thing.

I like the idea of controlling where cycles can be introduced, and supporting optimizations or additional performance properties where the cycles are not used. I also like the idea of annotating optimizations explicitly - 'soft' as a preference (warn on failure) or 'hard' (error on optimization failure) - because I believe that developers often have a better idea where optimizations should apply than would a compiler.

But I'm not much for languages that use libraries to gain new pervasive features. I believe such language design causes too many incompatibilities or inefficiencies between independently developed libraries, fracturing the language community to the point you might as well build many different languages.

fast lambda interpreter w/ ref counting

Maybe this is a terminology issue, but lazy evaluation (to me) suggests keeping the result after evaluating it, whereas a normal-order evaluation can generally increase the number of times you evaluate an applicative expression.

I think you are talking about what I would call "destructive graph reduction" or "graph reduction with in-place updating" for a normal order evaluator.

As an alternative, consider caching, instead. When you form an application as part of a reduction step, check the cache to see if that particular application has been previously seen (and is still in cache). If so the cache entry will contain some partial or complete reduction of that application. Update cache entries accordingly (that is the tricky step). Aside from the fact that associative look-up (looking up a newly formed application in the cache) is more expensive by a constant than direct memory reference, on today's architectures -- this technique, with its lack of cyclic references, can actually do (in some cases) significantly fewer reduction steps than in-place graph reduction.

Using caching instead of in-place reduction, all of your terms are still trees and benefit very much by sharing. Reference counting is quite good (simplified copy collecting would also be quite good!).

-------------

On the topic of mutation I still think you have a muddy view or, at least, I can't make sense of what you are trying to say. I don't know what you mean by "current 'live' memory state can be traced entirely to a single, global continuation," or what the significance of that would be. I don't see how it "hinders parallelism, concurrency, laziness, memoization, and the like". I don't see what "the like" is relative to that list.

---------------

I'll amend that my statement is based upon multi-core shared memory desktop computing architectures in common use today, and that I include the various reference-counting strategies I know

Ah. Ok. Honestly I don't care about such architectures other than to have performance that doesn't completely suck on them. I think those architectures are dinosaurs and a meteor is approaching, so to speak.

in the end, all computation is, by definition, syntactic manipulation and all syntax is, by the laws of physics, a domain of trees

I disagree. You could just as easily claim that "all computation is syntactic manipulation and therefore a domain of flat strings". [....]

There are no "laws of physics" preventing syntax from being graph structured.

A "flat string" is a (constrained) kind of tree. Physics affords us less constrained kinds of trees because we have three spatial dimensions -- classically a space with a three vector spanning set although quantum limits complicate that.

State in any real computation will be kept at some particular point in space. Active components (e.g., a CPU) operate on "nearby" state efficiently. Farther removed state must be copied to a nearby place to operate on it. To save the work done, it must be copied back to a farther away place. Physically, the state of actual computations we do with computers is naturally tree shaped. Indeed, note the current interest in the future of "3-D" processors, essentially trying to increase the branching factor of the tree to shorten paths to active components. It's nice that (thanks to examples like Turing machines, lambda calculus, and combinator calculus) we know that computational completeness can be achieved with a finite active element surrounded by a tree-shaped state that is mutable only near the active element. It is therefore curious that we are so smugly satisfied with RAM architectures whose performance promises are only achieved if we have relativity-violating "action at a distance" (causal, not quantum action at a distance, that is).

But I'm not much for languages that use libraries to gain new pervasive features [e.g., graph tracing for data structures that need it]. I believe such language design causes too many incompatibilities or inefficiencies between independently developed libraries, fracturing the language community to the point you might as well build many different languages.

Ok. I'm closer to Steele at what I think is the opposite end of the spectrum. I don't really know what "fracturing the language community" would mean other than perhaps someone somewhere makes less money while many, elsewhere, get more work done. Grow languages. Make a jungle. To hell with domination via the Humpty Dumpty principle (per Lewis Carroll). Nature tells us where to join and where to "fracture" - not will.

As an alternative, consider

As an alternative, consider caching, instead. [...] Update cache entries accordingly (that is the tricky step).

How do you update a cache without what, from a GC's POV, is mutation?

this technique, with its lack of cyclic references, can actually do (in some cases) significantly fewer reduction steps than in-place graph reduction

I agree. It combines some advantages of both memoization and laziness. The trick is deciding the proper cache size. For Fibonacci function, a cache of size 2 will do the job. I've used a small cache in this manner to partially intern values as I run.

Anyhow, I don't think you're speaking of region-level reference counts, which was the subject of my earlier assertion (and even explicitly mentioned in the sentence you quoted).

On the topic of mutation I still think you have a muddy view or, at least, I can't make sense of what you are trying to say.

The context was memory management. You need to think of mutation through that lens: what does 'mutation' mean from the point-of-view of a garbage collector? or to another CPU operating in parallel? I suspect you're thinking about 'semantic' mutation, but the context here is certainly mutation at the implementation layer.

A "flat string" is a (constrained) kind of tree.

It is also a (constrained) kind of set, or graph, or matrix...

Physics affords us less constrained kinds of trees because we have three spatial dimensions

Unlike trees, spatial dimensions afford more than one path from one point to another.

The qualification "less constrained kinds of trees" doesn't make much sense to me. A tree is, by definition, a constrained kind of graph. A less-constrained tree is not a tree; it is some other sort of graph.

State in any real computation will be kept at some particular point in space. [...] Farther removed state must be copied to a nearby place to operate on it.

Or you could move the computation.

To save the work done, it must be copied back to a farther away place. Physically, the state of actual computations we do with computers is naturally tree shaped.

You seem to be assuming that state is not shared, that there is no concurrency in physical computation. If two different computations interact with the same state, then the state of computation is not tree-shaped. You can artificially constrain computation to be tree-shaped, but it is not a physical limitation.

we know that computational completeness can be achieved with a finite active element surrounded by a tree-shaped state that is mutable only near the active element. It is therefore curious that we are so smugly satisfied with RAM architectures whose performance promises are only achieved if we have relativity-violating "action at a distance"

Agreed. We should instead be smugly satisfied with using Turing machines that mutate data locally on a (constrained) physical tree-shape called a 'tape'. ;-)

Anyhow, my understanding is that CPUs do exactly as you suggest: they mutate registers (near the active element) and explicitly load and store data to a distant 'memory'. But I find odd your use of the word "the" for active elements despite interrupts, MMIO, DMA, SMP, and other expressions of hardware concurrency (even before we start on semantic concurrency).

Ok. I'm closer to Steele at what I think is the opposite end of the spectrum.

Steele doesn't make pervasive support for garbage-collection and mutation and tail-call optimization and continuations and so on into 'optional' features provided by libraries. What makes you think you're on his end of the spectrum?

I don't really know what "fracturing the language community" would mean other than perhaps someone somewhere makes less money while many, elsewhere, get more work done.

It means that my libraries won't work with your application because I developed using GC, and you developed using GC', and those implementations conflict. Or perhaps the language doesn't have a consistent concurrency model, so I developed using synchronous IO and you need asynchronous IO.

It's about rework and boiler-plate. Without consistent pervasive qualities, you'll end up reinventing or adapting libraries repeatedly because each major library has different integration qualities than does its user.

Grow languages. Make a jungle. To hell with domination via the Humpty Dumpty principle (per Lewis Carroll). Nature tells us where to join and where to "fracture" - not will.

We already have a jungle - complete with warring tribes of barely-compatible frameworks and libraries and GPPLs that we must often coerce to work together for larger software projects. If you really embrace what you just told me, then your opinion on language design should be "don't bother". Nature is often short-sighted and hugely inefficient, and has a tough time escaping local maxima to discover better ways. Design is the opposite of nature.

Ha, interesting ongoing

Ha, interesting ongoing discussion to watch. :)

This is surely true : "We already have a jungle". I'm not so sure, though, the rest of the points Thomas made would allow us to infer further "[...]then [Thomas'] opinion on language design should be `don't bother'. Design is the opposite of nature.".

Take the usual, arbitrary, distinction dichotomy we make for General Purpose languages vs. Domain-Specific ones. We sure have two types of jungles there, as well, then.

Let's assume this dichotomy makes sense for the (vast?) majority of us, as far as language design, implementation, learning, debating on the formers, etc, are concerned.

Now, another question could be : for which (type of "jungle") would it presumably, still, be sensical (or, worthy) enough for reflection about the choice of, either : (a) the reassembling effort strategy or (b) the "just let them breed" opinion (in reference to Thomas' Humpty Dumpty principle illustration) ?

Put otherwise : would that choice problem make more sense for General Purpose languages or for DSLs ? (as one can obviously be interested in the long term progress in both, or either of, categories at any point time).

I don't know about you, but would I have to bet, I'd probably do so on General Purpose languages, regarding the (a) option.

Ok, I sure am aware that this is only one aspect amongst others to look at our habits in classifying languages, but seems like this dichotomy is still a pretty common one we go for, in our early approaches to such high-level considerations.

But, even if one answers the way we/I just did, does that save us from bothering about any further considerations/possible remaining issues regarding our a priori acceptance of all those DSLs to breed and come in our ballpark freely (per human's invention)?

My opinion is of course "no" : even if I'd be willing to, possibly, invest more into the "reassembling" the General Purpose realm (e.g., with this current "getting back to the roots" of FP, etc) that doesn't mean I'd neglect the other realm -- the dual of the former in that respect.

Then, the last thing I find myself realizing is about this, at the end of the day : whether it seems like a good idea to try reassemble this or that Humpty Dumpty -- or not (e.g., because many other people disagree with me) -- is maybe what doesn't matter that much.

Indeed, instead, I came to the realization that a possibly much more ambitious programme is to acknowledge that the full extent of our "raising in abstraction" regarding computer languages' intents vs. their design, there, hasn't really been investigated yet.

I suspect it's because, and unlike other maths or logical models and theories : we're maybe still focusing too much on the competition of language paradigms, disregarding the fact that our neverendingly "reifying engines" (our brains) will continue breeding them (these languages) no matter what.

Hence my personal interest in supporting this acknowledgement in the tooling support for really looking at, and managing, languages as first-class citizens. -- I found the most natural (to me) first candidates to play with, for these experiments of thought would be from that DSL jungle, as you've probably guessed. (And no, I'm obviously not trying to reassemble their designs, then -- that wouldn't be consistent with the reasoning above. Instead, the idea is to find practical ways to have their design intents, though diverse by definition, "communicate" through their implementations.)

My .02.

[Edit: and more than one year after having the idea, I have some clues that this still pretty much doesn't make sense to anybody but me... but, oh well: Hell, I don't have a deadline on that one after all/anyway ;)]

GPPL and DSL symbiosis

Your position (as I understood it) is that GPPLs should be subject to careful design, and DSLs be allowed to propagate arbitrarily (similar to libraries, datatypes, typeclasses) with the goal of ensuring that the DSLs can be assembled into a coherent and cohesive whole. With that much, I'd agree.

I'm not very fond of the common distinction between General Purpose Programming Language (GPPL) and Domain Specific Language (DSL). DSLs are not that independent. General purpose features are usually needed for corner cases, and for flexible abstraction and extension of DSLs. Different domains routinely interact.

So the relevant issue isn't separating DSL and GPPL, but rather of fusing them: how do we constrain a GPPL to support the desired domain properties? how do we extend the DSL to cover the corner cases and domain integration? where do the two come together?

If you have a bad answer to those questions, you'll end up fighting the GPPL to support the domain, or you'll be extending the DSL until you've invented a new slow, buggy, incomplete GPPL - which will compete with all the other slow, buggy, incomplete GPPL's you developed for the other DSLs. Pick one, or find better answers.

Haskell, especially with useful GHC extensions (multi-parameter type classes, type families, GADTs, functional dependencies) combine typeclasses with sandboxing DSLs inside arrows or monads (and decent performance) to make it among the very best industry-strength language currently available for growing languages and supporting DSLs and such. I would prefer a solution more suitable for open distributed programming (which means it should favor purely syntactic and structural techniques, rather than nominative typing) but that's why I pursue my own research.

I came to the realization that a possibly much more ambitious programme is to acknowledge that the full extent of our "raising in abstraction" regarding computer languages' intents vs. their design, there, hasn't really been investigated yet.

I'm not sure where the "raising in abstraction" goal comes from. The purpose of a programming language is to allow humans to express intent to a machine, and to one another (maintenance, integration, example, and explanation). There is such a thing as "too high level" when expressing intent; you'll end up fighting the language just to capture essential complexity. There is also such a thing as "too low level" when expressing intent: you'll end up expressing too many constraints - specific order and mechanism where you wanted to describe tasks and strategy, and these extra details are all accidental complexity.

Most projects aren't developed from scratch by just one programmer. Even the so called "single-developer" projects will integrate diverse libraries - which will continue to be maintained by other developers. Thus, while programming languages need to support developers in expressing intents (at whatever level is appropriate, via DSLs where appropriate necessary), they should not do so at cost to cohesiveness. To the extent you lose cohesiveness, you hurt everyone - even the 'individual' developers. It is important that libraries and DSLs and services compose without requiring the developer implementation details of those services and libraries - whether they use synch vs. asynch IO, which calls are mt-safe or reentrant or use TLS, different memory management strategies, where hold-and-wait occurs, etc. Usually, this means the libraries must use common concepts for the non-domain details.

You will need to sacrifice some flexibility if you desire cohesiveness. In particular, you lose flexibility for ad-hoc global transforms and pervasive features - not because a single developer cannot get them right (though that's hard enough) but rather because a prospering community of independent third-party developers cannot get them right. The relevant pervasive (cross-library, cross-service) features - those you'll need to design into the language up front.

This doesn't need to be a significant sacrifice. Local transforms (via macros or first-class modules/monads) are still cool. And one can still build a great deal of flexibility into the language - open functions, generics, dynamic scope, first-class continuations, structural typing, first-class modules, staged programming, reflection, annotations, etc.

tooling support for really looking at, and managing, languages as first-class citizens. -- I found the most natural (to me) first candidates to play with, for these experiments of thought would be from that DSL jungle [...] the idea is to find practical ways to have their design intents, though diverse by definition, "communicate" through their implementations.

DSLs do not need be distinct from data-types.

Languages can effectively support a flexible range of DSLs - their abstraction and extension and integration. Haskell is relatively effective for this purpose, compared to most industrial languages; OCaml's camlp4 and recent first-class modules also give it some high marks. Logic languages are also advantageous for this purpose.

The difficulty is integration of DSLs and intents - the cohesive adhesive in the middle.

Thanks for sharing your

Thanks for sharing your point of view.

I'm going to make a short enough (hopefully) answer to clarify this for you, as I suspect I have "advertised" / called for help enough already here on LtU, in my quest to find other brains to develop these ideas -- that actually only relate remotely to PL (though a bit closer to PLT, and much closer to modeling and tools).

So the relevant issue isn't separating DSL and GPP[...]

You almost understood me, but not completely. As I've said already (or meant to say, if I failed), the GPPL vs. DSL dichotomy is :

i) rather arbitrary to make, a priori and per se
ii) mostly subjective (e.g.: is XSLT a DSL or, as having full-blown FP capabilities, can be better seen as a GPPL "by some folks"?)

iii) definitely relative, anyway -- even guys like C# are DSLs in some loose, far fetched if you wish, sense : where the "domain" you want to only consider for its use is, say, "programming the computer" -- while it wouldn't be suitable for other domains, like having Mr. or Mrs Smith able to model what is his/her day-to-day business domain, at Acme Inc.

Also, in "DSL" one might argue there's also implied we like it or not some form of "CSL-dimension" -- Competence-Specific Language; we're all humans, aren't we? How come everybody cannot have a generic programmer skillset like us around here? Because, of course, we, with end-users, all use computer languages, and all the time, but it's just at different usage and tooling levels (i.e., the competences we, and end-users, have) for the same domain shared by both populations (i.e., what the computers eventually do for us and end-users). Hmm... But I mostly digress, sorry.

Hence, I am not trying to oppose DSLs to GPPLs, nor I am trying to unify them (how, anyway?) -- nor I am trying to reinvent the wheel and reduce any Turing-completeness-equipped GPPL to some sort of DSL where the domain is computation in the broadest sense (and mostly uselessly introduced as such).

I'm not sure where the "raising in abstraction" goal comes[...]

To understand this, you'll have to read my rationale closer, but to put it in a short one-liner: it's all about the issue of "the scalability of reuse" of language implementations.

It is only this, that I took/kept from history: we keep inventing languages, be they "more or less" GPPLs or DSLs, and executable logics for them to produce other "dry" artifacts (to over simplify: outputs of X becoming, thru the filesystem or whatnot, the inputs for another executable logic Y, and so on...)

Then, the lesson I've kept from watching this ongoing history (with PLs, of course, but also with modeling notations, etc) is that the fundamental lever we use during the transition from the design to the implementation of the corresponding language processing tools is : reification.

Say, you invent Java out of a small set of simple ideas: OO paradigm, plus virtual machine execution environment, etc.

Then, at some point, you figure that it'd be cool to reify the OO type system and environment you've built : there comes metadata, and a (class-)type named 'Type' in the library, the instance values of which are "structurally" the same as other application-specific types' like 'Person', 'Account', etc.

But... I'd ask candidly, forcefully very naive, pretending to have never been exposed to RTTI (Run Time Type Information) before: why such subtleties really, by the way? What's the point, there? Why did we have to bother about this kind of thing at some point (be it during the language's design itself, or proof-of-concept'ing it thru its first implementations)? What's the purpose of this kind of animals, at the end of the day, anyway? We're already able to model 'our world' of Persons, Accounts, Affiliates, etc, in the language, aren't we?

(Ok, enough rephrasing that kind of questions, you got the idea.)

Here's what I found: it's because such reifications -- and please have a look at other contexts, e.g., such as namespaces in XML, and later on, RDF that builds upon the namespace additions "trick" made concrete in XML syntax (one year only --1999-- after XML inception, in 02/1998) -- enable a fundamental technique of design and implementation of software systems we want able to cope with the general scalability problem of the technologies they're based upon:

that technique is the enforcement of an effective loose coupling between the "concrete" implementation components.

Another famous precedent is our loosely-coupled global web of information, even though some would (arguably or not) claim it isn't enough. Try remove the out-of-band mechanism that the HTTP headers part of the spec (Content-Type:, Accept:, etc) enable in the overall applicative protocol that is coined as "HTTP" and keep only the verb (GET, POST), the URI/URL, and the request body.

It's then more difficult for you to implement, practically and across yet unknown applications (i.e., arbitrarily, over time) the various HTTP-transported data payloads your application, and others' will encounter. Because you then have a more tightened architecture between HTTP clients, servers, and intermediaries -- that will have to agree with each other "at a too large extent", in their implementations, upon things mostly or totally unrelated to their respective applicative domain's interests/target purposes.

In this same instance (WWW), if the WWW infrastructure hadn't provided this standard way of dealing with that out-of-band info/metadata for the content negociation use cases (via aforementioned headers), producer and consumer applications running on top of the web infra would have had to decide on a case by case basis how to deal with the issue, one with every others.

Just extrapolate (or interpolate, actually?) further, and think of the WWW as a kind of mega-executable environment (like, approximation of a T.M, or is it? I don't really care at this point) where you know that you don't know how to have a chain of N programs you write for it terminate with a useful final computation result, where you can check each program individually, but not the chain as a whole.

Metadata, or out-of-band signaling services provided by the environment/infra and for the application enable opportunities to give clues to each program N-1 output -> program N input link about the confidence you have in the very relevance of using that link, precisely, as program N-1 yields its output (from its PoV), towards N. This is done by having an infrastructure-instantiated "metadata" marker value held at N-1's while it yields its output, and queryable from N's, as it prepares itself to process this incoming input (from its PoV) from N-1.

Think also (other example) of the meaning, brought by design of the classes at hands, of the custom attributes that decorate a method member of a web service (that will generate later on the corresponding WSDL defs) -- this is a way for the other end (the web service client and its proxy generated in turn from the web service's WSDL description) to know : "OK, that's a method that will return me a ZIP code as string, 5 character long, max." -- neither of the two parties (server and client) classes requires more truthful evidence from some hypothetical program verifier/prover that the web service's implementation actually produces ZIP codes that way and/or that the client is able to accept and process these ZIP codes -- we're just in a pattern of good faith between civilized people (server, client) speaking with a HTTP headers + WSDL XML namespace + .NET type information (custom attributes) scheme of metadata / combination of "technical politeness" hints ... just because the current state-of-art of providing computable evidences in such a multi-layered exchange can't do better, as of today. Or, even if it does (in academia papers), maybe because all the standards and/or implementations involved don't integrate all the algorithms required, yet.

Back to the HTTP example: now, the actual representation and location, itself, in the protocol message formats of these headers matters somewhat less or not at all, from a design standpoint -- what really matters is you have, at least, introduced them somewhere, upfront from the design, and for the purpose at the end of the day of enabling such loosely-coupled exchanges between your overall system's (WWW) executable components (implementation); like, e.g., the content negociation between user agents. There, content negociation, to take the same example, can been seen as a reification of an abstract "object" of your system's purpose : more specifically, a dimension of the transactions over another "object" -- the exchanged pieces (in space, time) -- hypertext content (HTML, XML, image/jpeg, etc).

These reification levers in the transition from design to implementation, seem to be always, soon or late, a necessary piece to introduce "somewhere" to have useful systems scale.

We got ourselves familiar with them when speaking about types in PLs -- my question (of anticipation) is: how would that extrapolate to other kinds (that I believe, will be coming eventually) of software systems where languages (with syntax and semantics) have become "craftable as easily as" today's class types or functions, GADTs, etc?

Thus, I learnt from practice as much as from others that loose coupling is an important "design workaround" we use when we know we won't be able, later on, about the same implementation pieces, to prove formally that the resulting semantic meets the intended purpose of the overall system -- in the cases where either isn't, or both aren't, specified anywhere, or if they are, the specifications are not completely (in Turing sense) executable/computable and intended for humans only.

Hence my "current" attempt at investigating what "metadata" (in the generic broad sense of "data about data", or "out of band information", and which is just one well-known way among others to help us reify "things" in systems) should be for languages -- and to start with, what we commonly (by some sort of habit or tradition?) classify as DSLs, when we do so thru the "looking glass" of the intended purpose (in a domain).

Does this make more sense?

[Edit: sorry, had to edit several times; but I'm done now; and I repeat, as "I know my limits" as they say, I need help about this on the topic, btw. Any brain interested, always much welcome to join!]

[...]The difficulty is

[...]The difficulty is integration of DSLs and intents - the cohesive adhesive in the middle.

Well, indeed, that's what I'm trying to address -- say, "by constructivism" on the topic. (I see a better understanding of what I was trying to express, here above near the end of your reply, than at its beginning, btw.)

Oh yes, one final note about

Oh yes, one final note about this, for people willing to try understand my idea the way I intend it to be:

DSLs do not need be distinct from data-types (*)

I used to think that way, or rather, to overlook another possibility, as far as languages are concerned... until almost two years ago, when the idea sparkled in one of my reflections re: DSLs as viewed as the central object of study and processing in modeling tool chains such as Oslo (first), and more recently re: PLT more generally -- so that I obviously doubt about this type of statement now.

But I'm much more sure that my idea is really nothing new in its object, but probably (very?) new, in its modality; which makes it even more difficult for me to explain (even to myself, in the beginning).

My basic assumptions are about i) the validity of the constructivist approach for the progress and innovation in computer language and tools, and ii) the law of the excluded middle for the logical reasoning I can use to develop i) precisely -- that is:

i) by constructivist, we accept the fact (thanks, Mr. Godel) that whatever formal system we come up with for later practical purpose in software, we just won't try to have it complete, just consistent ... and well, hopefully, in some way effectively practical and useful for the identified purpose.

There, I've given the purpose aimed at, earlier in the thread and on my pages: let us have better assurance about the scalability of language implementations reuse (in software, anyway), should the grammars/phrases of the computer languages they process -- independently of the operational paradigm, FP vs. OO vs. etc, but on T.M's anyway -- eventually take the place of today's first class object types, functions/values in executable type systems.

But also ii):

If one accepts the law of excluded middle, then either i) is pure fantasy regarding the immediate practical aspect of refuting (*), or... is not.

Now, if we look at today's situation re: executable type systems, I believe that their current, de facto nature remains for them being tangible T.M's that happen to not operate on linguistic values, but on bytes, arrays, lists, trees, etc, precisely per the current state of our technology(ies).

This belief I have is thus an immediate, direct refutation of your statement (*) above without even an intermediary step.

But then, which clue did I get recently to prefer this path of idea?

Well, I only looked back at the way PLs (or DSLs) have been thought of, during their design and implementation, as languages meant to be eventually brought to life on ... T.M's. Well, without surprise, the focus has always been so far on making sure we have "nice properties" to predict in most cases the programs' termination and/or the end result of the computations they induce in terms of lower level executable languages' properties in turn, and so on...

But!

I have yet to find one (of these languages) that bridges the gap between what the syntax and semantic of one language "running on top of" another one's proper logic is intended to be as opposed to what will be actually computable/observed at runtime. This is something we indeed have for type system execution envs like Java's or .NET's in their metadata services, tangibly, documented, etc, and available at the source (syntax) level of their compiled languages' syntax. We have it too, to some extent, in the WWW, especially when viewed via the REST thesis and all the other XML-helpers.

While, again, if you accept to think maybe more "out of the box" like I try to do, then you see that we cannot take for granted at all that this existing mechanism in whatever tangible VM you choose to consider is of the exact same nature as what I described in opening the paragraph before this one.

[Ok, I'm done for good and for a while now about this; I don't want to "pollute" more this thread about this; but thank you for having bothered reading and replying my, say, "abstract nonsense" :)]

re (answers to): "As an alternative, consider"

As an alternative, consider caching, instead. [...] Update cache entries accordingly (that is the tricky step).

How do you update a cache without what, from a GC's POV, is mutation?

This is very constrained mutation. We have tree-shaped, immutable "terms". We form "applications" (two terms joined in an application, forming a new term) -- no mutation there, although there is a "hash cons" opportunity. We have a cache that maps terms to what those terms reduce to after some indefinite number of steps. Both the hash key and the hash value in the cache are these tree-shaped terms. The cache itself is a tree. The only mutation here is adding, removing, or changing cache entries. No cycles can arise. (Say, have you looked at awk? :-)

I agree. It combines some advantages of both memoization and laziness. The trick is deciding the proper cache size.

We are separated by a common language because I wouldn't use those words that way and can barely parse your use of "laziness". Cache size is not the only tricky bit. Cache replacement / update policy is the issue that makes this technique not quite a provable silver bullet.

[Regarding physics as implying a tree of state] Unlike trees, spatial dimensions afford more than one path from one point to another.

Heh. No, that's not quite right but now I can see that you are starting to get what I'm saying.

Presumably you aren't proposing CPUs that physically move around relative to the frames of reference of memory components, right? CPU and the most interesting part of its memory share a frame of reference, for our purpose. Local (to the CPU) rearrangements of that tree are the only useful way that tree changes topology. (We would have to and can expand this way of looking at it to take care of concurrency / IO -- and get nifty results (e.g., Lampson on concurrency) -- but that's separate from what we've been talking about.).

Steele doesn't make pervasive support for garbage-collection and mutation and tail-call optimization and continuations and so on into 'optional' features provided by libraries

Sure, he makes some mistakes here and there. :-) (What makes me think I'm on "his end of the spectrum" to answer the serious question you asked is his generalized discussion of "growing languages" in which, to make his point, he does not beg the question of the essentialness of built-in graph-tracing GC. It so happens that his three most famous language designs have built-in graph-tracing GC -- but nothing in his "grow a language" arguments necessitates that.

Be Water

We are separated by a common language because I wouldn't use those words that way and can barely parse your use of "laziness".

Laziness is not directly correlated to normal-order evaluation. Perhaps you should review Wikipedia's explanations to get back in touch with the 'common language'. [1][2].

Presumably you aren't proposing CPUs that physically move around relative to the frames of reference of memory components, right?

They do that. Mobile computing and survivable networking based upon such mobility were my post-graduate subjects. But that wasn't what I was proposing. ;-)

Physical relationships between CPUs and other systems are not tree-shaped. CPUs talk to one another, and to memory, via various buses and point-to-point protocols.

More relevantly, computations are not constrained to particular CPUs, nor to tree-like relationships. Computations may involve complex, concurrent interactions between systems. Nor is the data inherently organized in trees - it can take many shapes, have many relationships, can flow, can mutate in many patterns... it be water, my friend.

There often is no clear distinction between computation and data - they blur together in a dataflow.

We would have to and can expand this way of looking at it to take care of concurrency / IO -- and get nifty results (e.g., Lampson on concurrency) -- but that's separate from what we've been talking about.

Certainly, the reason I object to your position is in part because of concurrency, so (of the two of us) at least I've been talking about it. If you weren't hearing it, then we've been talking past one another. When you said something is true based on 'physics', did you mean to exclude concurrency from physics?

What makes me think I'm on "his end of the spectrum" to answer the serious question you asked is his generalized discussion of "growing languages"

You embrace his vision without his caution? His "grow a language" argument also pontificates and rambles on how to achieve smooth, wart-free growth in the language - picking the right things to add to a language so as to aide developers in growing the language to work together in a controllable, predictable manner.

Now, there are warts and then there are warts! With care, one can design a wart so that it will not be too hard to take out or patch up later on. But if you do not take care at the start, you may be stuck for years to come with a wart you did not plan. -- Guy Steele, Growing a Language

Finally!

Be water

Where does this fit in the diamond vs. ball of mud dichotomy?

I laughed when I saw that subject line on the tracker. In hindsight, it seems incomprehensible that LtU managed to go a whole decade without once referencing Bruce Lee, until now.

Here's another one tailor-made for LtU, although controversial:

"When you understand the root, you understand all its blossoming."

Thief of Time

'Of course,' said Lady LeJean. And, inside, a thought that she hadn't known she was thinking, a thought that turned up out of the darkness behind the eyes, said: We are surely the most stupid creatures in the universe.

When in doubt, turn to Terry Pratchett ;). This is a quote from a novel written by him which dimly touched on the 'become the water' metaphor. ;)

LVI. Old and new tables, #29: the diamond & kitchen coal

Old and new tables, Nietzsche on the kitchen coal and the diamond.

Be Realistic

I am enjoying form and odd function of the back and forth here but, remember, (as far as I can tell) you set out here to defend the notion that (I'll closely paraphrase) reference counting doesn't work for pure functional languages because cycles are necessary.

I gave you the sketch of an efficient interpreter that proves that's not the case and a general principle about the physics of single CPU machines and the ultimately syntactic nature of computation (and how that maps onto the physics) to point out why we "shouldn't act so surprised" that we don't need graph tracing GC at the core of things.

I'm not sure where you are going by dragging in concurrency vis a vis any implied need for graph tracing GC. Is there some reason that you think generalizing these multiple tree-based models of computation to concurrency requires a change in GC strategy?

I think if you want to argue seriously in favor of languages that require graph tracing GC, you need to make what amounts to a cognitive psychology argument -- that it easier in general for people to reason using idioms that require graph-tracing GC and therefore worth the performance hit and/or implementation hair. I think you can make the case that graph-tracing GC is convenient for certain problems, but not in the general case.

Re:alistic

I think you understood my assertion out of context, since I was discussing region-level reference counts - which, by nature, involves a trade-off between precision and performance. You included this point in your quote without asking about it, so I assumed you understood it. I describe it further, below.

Your approach of using cache-level memoization (instead of what the rest of the world would call 'laziness') is interesting, and I'll certainly keep it in mind.

I have serious issues with an assertion you made: "in the end, all computation is, by definition, syntactic manipulation and all syntax is, by the laws of physics, a domain of trees."

You have told me that the it's a law of physics that syntax cannot be graphical. Why should I find that credible?

You pointed at spatial dimensions as a basis for trees, and I just as easily pointed out that (unlike space) trees, by definition, don't allow multiple paths from one point to another.

You claim that computation is syntactic manipulation by definition, but my understanding is that (in the general case) computations can be interactive, reactive, concurrent, logically continuous. Introduction of time, synchronization, open modularity, hidden models, etc. hinder us in reasoning about a great deal of computation as mere 'syntactic manipulation'. For example, you can model an open service-oriented architecture as pure syntactic manipulation, but you can't implement or integrate one that way.

You pointed at single-CPU machines; I find it odd that you think single-CPU machines a suitable example for generalizing laws of physics and the nature of computation, but, even if I accepted that position, the concurrency and multi-path properties of single-CPU machines (interrupts, DMA, MMIO, bus events, etc.) would interfere with my accepting your conclusions and your belief that computation, even on a single-CPU machine, is generally 'tree-based'.

Not that I considered it especially relevant. The earlier context of "cache friendly languages" and "OS is a language" really needs to involve discussion of the common-case multi-core and concurrent systems. Thrashing caches between cores is one of the issues in achieving efficient parallelism. Multi-core issues are also the reason I wasn't discussing fine-grained memory-object reference counting, earlier.

Fine-grained reference counting is not multi-core or cache-friendly and performs poorly in the following ways:

  • It costs an extra word or half-word per memory object to keep the current reference count. It is common for most objects to be relatively small. These extra reference-count words can cost us ~25% memory and cache overheads. Extra cache overheads directly result in extra cache misses.
  • Often code must touch reference-counts even when it doesn't peek at the inside of an object (e.g. for generic programming, concurrency, or generating a new persistent tree structure from an existing one). This means paging in the reference-count word for an object, which often means paging in most of that object. If two or more CPUs are managing references to the object, then updating the reference counts can thrash caches between them.
  • Reference counting implies allocation and release using free-list or similar mechanisms. Allocation using a free-list has a high overhead compared to nursery allocation. Memory will fragment over time which hurts locality of references, reducing probability that a referenced object will be found in the cache.
  • For the multi-core systems, one generally needs atomic CPU operations to increment or decrement the reference count. This is (in benchmarks I've seen) approximately two orders magnitude slower than a non-atomic regular incref/decref - presumably due to memory barriers and pipeline stalls.
  • Reference-count management is scalar and spread like peanut-butter across the entire codebase (incref, do something, incref, decref, do something, decref...), with a branch after each decref, poorly organized for performance. Most of this is wasted effort; objects tend to be accessed much more often than they are deleted. Compare: tracing GC algorithms fit in tiny loops and operate over large regions - easily fitting into cache, easily subjected to loop-unrolling CPU-pipeline parallelism, easily supporting anticipatory paging and caching (where one touches memory one will reach in the near future to get it into memory and into cache while one is processing other stuff). Incremental GC can be heuristically targeted to improve success rate: the ratio of memory freed to GC overhead.

Region-based reference counting will amortize these performance issues, but at the cost of reduced precision. A reference-count applies to a region (bag) of objects rather than an individual object. So long as someone needs an object from a bag, the whole bag is kept around. Thus, even if a trace of the individual objects would prove acyclic, it is easily possible for the aggregate regions to carry cyclic references.

The issue of cyclic references arising in the aggregate due to acyclic references between objects can be avoided (with self-discipline or integrating regions into type-safety), but doing so generally costs language expressiveness and features such as laziness (as the term is commonly used), promise pipelining, generic message passing, linearity optimizations, and data-parallel sparks.

if you want to argue seriously in favor of languages that require graph tracing GC, you need to make what amounts to a cognitive psychology argument -- that it easier in general for people to reason using idioms that require graph-tracing GC and therefore worth the performance hit and/or implementation hair.

Performance hit?

Earlier in this thread I posited that tracing GC offers better performance characteristics than does reference counting, especially at scale of multi-core systems or integrating with paged memory (virtual memory or orthogonal persistence). Graph-tracing GC also supports greater expressiveness and flexibility - which is convenient for many applications and for various other performance-improving optimizations.

Reference-counting has one performance advantage: you can make a guarantee about the amount of time between memory becoming inaccessible and that memory being available for recycle. This is a real-time property, and is very useful for non-memory resources. But this advantage comes at huge efficiency costs, which I described earlier.

I'll grant that the implementation can be a little hairy. A good tracing GC implementation needs to be aware of paging, of concurrency, ought to be incremental and ideally concurrent and real-time, and so on. OTOH, if you want 'good' reference-counting, the implementation similarly gets very hairy very quickly, using thread-local buffers and analysis for deferred reference counting and sliding views. On the other end: heap-allocation using free-lists is considerably more complicated to get efficient - especially in a multi-core setting with paging issues - than is copy-collection plus nursery allocation.

I don't need a cognitive argument because I have a performance argument. I would disfavor reference-counting even for a language that doesn't require the full expressiveness offered by tracing.

But if I were to provide a cognitive argument, it would be based upon expressiveness and concurrent, open modular applications - where one doesn't have a lot of local control about how references are managed in the larger context.

you can make the case that graph-tracing GC is convenient for certain problems, but not in the general case

Tracing GC is convenient for most purposes, most problems.

The primary exception, to my knowledge, is if you also want to collect non-memory resources and 'disposable' objects - open files and sockets, for example. Tracing GC, even real-time tracing GC, doesn't help there. Real-time GC means there is a hard upper bound of time you'll need to wait on a collector to access memory or perform an allocation, which is a very different property from an upper-bound on the time before any particular piece of garbage is recycled. Incremental GC often trade some precision - allow some small percentage of garbage to go uncollected - in order to improve performance properties.

But we don't need GC to handle non-memory resources. There are many alternative techniques and idioms. The one I'm favoring in my language design is demand-driven resource acquisition: resources will be implicitly released whenever not under active demand.

Can you think of other exceptions, where tracing GC might be less 'convenient' to its users than reference counting?

A good tracing GC

A good tracing GC implementation needs to be aware of paging, of concurrency, ought to be incremental and ideally concurrent and real-time, and so on.

Any incrementality added to tracing GC is due to reference counting. All GC algorithms are hybrids of tracing and ref counting.

I would also note that performance arguments against ref counting should take into account this duality. Roughly, ref counting only needs to touch dead objects, and tracing only touches live objects, thus, there are heap configurations where each is optimal.

Incremental GC

Under Bacon's operational definition of 'hybrid', I could make a good case for an airplane being a hybrid of a bird and a bus - because I can think of a plane as sharing important characteristics of both. I'd prefer to reserve that word for collectors that actually combine mechanisms of reference counting and tracing, as opposed to merely sharing some interesting properties. There are some true hybrids out there.

ref counting only needs to touch dead objects, and tracing only touches live objects, thus, there are heap configurations where each is optimal

Typically, reference counting touches live objects quite often. Every mutation to reference counts that doesn't result in deleting the object is touching a live object, is it not?

Typically, reference

Typically, reference counting touches live objects quite often. Every mutation to reference counts that doesn't result in deleting the object is touching a live object, is it not?

Except any serious implementation of reference counting defers those operations, so most objects are not modified at all in the normal case. See the sliding views paper I linked to elsewhere in this thread.

any serious implementation

any serious implementation of reference counting defers those operations

I suspect you'll discover, then, that most implementations of reference counting are not 'serious'. ;-)

But perhaps that isn't fair. To my understanding, the majority of reference counting frameworks used in industry are added ad-hoc to traditionally non-GC'd languages, e.g. via smart pointers or explicit hand-specified incref/decref operations in C language. Tracing implementations, and most 'hybrids', OTOH, are nigh universally separate from the code, allowing them to evolve with a compiler or interpreter.

I agree that we can do a lot better than traditional reference-counting by deferring the counts, sacrificing precision, increasing latencies between objects becoming garbage and being collected, running "trial" decrefs to eliminate suspected cycles, etc.. But my understanding is that these features are not typical among reference counting implementations.

ref counting vs. .....

Roughly, ref counting only needs to touch dead objects, and tracing only touches live objects, thus, there are heap configurations where each is optimal.

The discussion is confused on this point. The disagreement isn't so much about reference counting vs. copying vs. mark/sweep etc as it is about the need or lack of need to be able to collect cycles.

The original assertion I objected to was that reference counting could not work for a pure functional language.

The original assertion I

The original assertion I objected to was that reference counting could not work for a pure functional language.

I never made that assertion. I said that region-level (or page-level) reference counting does not work for pure functional languages. You even quoted the statement on inter-page references! The basic issue has always been that cycles can exist between regions even when they don't exist at the finer 'object' scale.

re original assertion

This is what you said that I think is flat out wrong. The question of "region-level (or page level)" is not relevant:

Reference counts don't work, even for pure functional, because lazy evaluation is a form of mutation and can produce cyclic page references - and, really, most languages will have at least some mutation.

You can do regions just fine without inter-page cycles because you have such strict limits on mutation (e.g., only to "registers", so to speak, in a classic combinator language)

Re: original assertion and context

I agree with your claim. If you sufficiently restrict implementation-layer mutation, to the point of hindering other useful optimizations (data-parallelism, linearity or uniqueness types, call-by-need laziness, etc.), then you could guarantee that old pages never point into new pages. You could even still achieve concurrency via temporal logic (FRP), and a sort of task parallelism.

But I don't consider the context of my earlier statement to be irrelevant. The question of "region-level (or page-level") was in the title of the response you quoted, and the context of language-as-OS and multi-core precedes even that. Contextual issues are mentioned in the sentences before and after the one you quoted. It's even mentioned within the sentence you quoted (can produce cyclic page references). I'd rather not play games of calling context 'irrelevant'. Misunderstanding the scope of an assertion by accident is natural, but to ignore context intentionally is just disrespectful.

answers to dmbarbour (re: re:alistic)

Maybe we should "get a room" (e.g., take this off blog for now) soon if we're going to continue. I think it's interesting but I'm not sure it's quite right for LtU. Still, here's a response.

You have told me that the it's a law of physics that syntax cannot be graphical. Why should I find that credible?

I'm not saying anything exotic. Suppose that you have a "graphical syntax", ok? I hand you a sheet of paper and you write an expression on the paper in this syntax (or I hand you a ball of yarn and some beads, or whatever you like). However you record your expression in your "cyclic syntax" the cycle still exists only in the imagination. The actual physical state is a bunch of parts having space-like separation. The "cycle" is there only how you think about it, not in what it physically is. If you describe a computation - a transformation of you expression into others following some rules - you're still moving around these objects that don't "in their essence" have any cycles. This seems like an almost tautological point and I'm not sure why it bugs you so.

You pointed at spatial dimensions as a basis for trees, and I just as easily pointed out that (unlike space) trees, by definition, don't allow multiple paths from one point to another.

Yes, I messed up there. I should have said singly rooted directed acyclic graphs. The thought-bug that led to the mistake is that when I think of immutable trees, I presume sub-tree sharing -- but the less misleading name for that would be something like singly rooted directed acyclic graph.

You claim that computation is syntactic manipulation by definition, but my understanding is that (in the general case) computations can be interactive, reactive, concurrent, logically continuous.

This is a legitimate language usage issue, I think. Please consider me (to see the language issue) to have said calculation rather than computation.

For example, a (traditional) stereo amplifier is a particular kind of analog computer. It computes an approximation of a multiple of some input signal. It does not, though, calculate that approximation. Rather, it uses the input signal to regulate various other analog processes and, analytically, we can see that the output is an approximate multiple of the input. In contrast, calculations involve symbolic manipulations in discrete (though not necessarily single-threaded) steps.

"Calculation" is syntactic manipulation by definition.

For example, you can model an open service-oriented architecture as pure syntactic manipulation, but you can't implement or integrate one that way.

Hmm. Do you suppose that if I had a cluster of machines then perhaps my syntactic model might be executable? I don't see why not. (I want to make the joke, but, I don't want be so snarky.... the joke being "we could call the Internet" ... Sorry. I really don't mean to be snarky, it was just too funny to pass up. :-)

Back to real stuff ...

I find it odd that you think single-CPU machines a suitable example for generalizing laws of physics and the nature of computation,

Well, this is a bit sweeping but I'll say it anyway. We (as a species, as a culture, as a line of thought, whatever) are (almost certainly) only ever going to program a finite number of devices that operate on state spread over a finite area. For some purposes, therefore, things that are true about the single CPU case are true for computation in general.

You are very right to call me out about "tree" though:

f I accepted that position, the concurrency and multi-path properties of single-CPU machines (interrupts, DMA, MMIO, bus events, etc.) would interfere with my accepting your conclusions and your belief that computation, even on a single-CPU machine, is generally 'tree-based'.

Singly rooted DAG will do. Doesn't have to be a tree. Again, when I said "tree" I was assuming sharing (and, consequently, multiple paths to various points in space).

Fine-grained reference counting is not multi-core or cache-friendly and performs poorly in the following ways:

No need to touch that one. Naasking pointed out some stuff and, more generally, it's irrelevant to the original discussion which was about whether ref counting could not work for a pure functional language because it doesn't handle cycles. You said so and I say otherwise.

On this, I think you are just mistaken but I don't have anything to add to what I said earlier as to why:

The issue of cyclic references arising in the aggregate due to acyclic references between objects can be avoided (with self-discipline or integrating regions into type-safety), but doing so generally costs language expressiveness and features such as lazines

So, sorry. Similarly, this:

Can you think of other exceptions, where tracing GC might be less 'convenient' to its users than reference counting?

asks me to defend a position I did not take. (I would argue that GC techniques that don't have to deal with cycles or random access mutation can be more convenient in the sense of being generally less expensive by multiple metrics -- but that isn't what you are asking me to defend.)

Cycles and Syntax

This will be my last response on the subject, because I agree it's time to drop it (before it squeezes into the right half of my display).

However you record your expression in your "cyclic syntax" the cycle still exists only in the imagination. [...] This seems like an almost tautological point

You should apply your argument first to your own position. Any "tree" or "string" also exists only in your imagination - a subjective model for a physical configuration in space-time, subject to all sorts of philosophical bundle-substance issues. You cannot use such 'tautologies' to argue that syntax is tree-based rather than cycle-based.

I can think of only one rational basis for arguing trees vs. cycles vs. flat strings vs. sets, etc. is the basis for a computation model or syntax - and that is to associate elements of a computation model (or its implementation - morphism between models) with points in a graph, then determine relationships between those points. If the relationships form a strict DAG, then the computation is a DAG - at least in the selected relationships. We always qualify one model by describing it in terms of another.

Applying this to textual syntax, we might associate names with points and cycles are expressed by semantic relationships between names (recursion, fixpoints, etc.). For a graphical syntax, cycles are expressed more explicitly by directed or bidirectional lines representing relationships between explicit nodes. At the structural level, one might similarly use 'processes' or 'objects' or 'services' as points, and relationships between them would be based upon communication. Once again, cycles may exist. In context of garbage collection, of course, what matters is organization of atomic memory objects and dependencies - usually represented as pointers - between them.

To whatever extent spatial-temporal patterns (such as cars and weather and FPGAs) "exist" outside our subjective imaginations, cycles may "exist" in our syntax, our calculations, our computations. Certainly, if they didn't exist, we would not need to carefully restrict expression when we desire to avoid them.

For some purposes, therefore, things that are true about the single CPU case are true for computation in general.

I agree.

Naturally, those few purposes constitute a relatively small subset of those that most interest me ;-). A single CPU isn't subject to mobility and relativistic temporal drift, doesn't experience disruption or network partitioning, is very restricted in parallelism (but can handle concurrency, via context switching), has fewer security and ownership and trust issues.

My interests make me, perhaps, too-quick to focus on the differences rather than the similarities. I apologize for any frustration this has caused.

Fine-grained reference counting is not multi-core or cache-friendly and performs poorly in the following ways

No need to touch that one. Naasking pointed out some stuff and, more generally, it's irrelevant to the original discussion which was about whether ref counting could not work for a pure functional language because it doesn't handle cycles. You said so and I say otherwise.

Personally, I consider you to be disrespectful of the context which scoped my statement. At first, I thought it was misunderstanding - perfectly forgivable - so I've made efforts to clarify and explain why I was not talking about plain 'ref counting'. But you have, thoughtfully, corrected my opinion on that particular matter.

If I later argue that modern circuit boards don't work while immersed in salt water, I'm certain you'll feel free to extract "circuit boards don't work" and consider the rest "irrelevant". I'm sure you'll have just as much success as you did here.

convenient in the sense of being generally less expensive

"Convenience" rather strongly connotes human productivity, especially among the users rather than the providers. Since you meant 'convenience' in some way that I don't see in any of the dictionaries I'm perusing, then... I apologize for not asking what you actually meant. I should have known better after you used 'laziness' to refer to any old 'normal-order' reduction. (Paul Hudak patiently explains that lazy evaluation is a particular implementation strategy for efficiency in 1989, referring to work by Wadsworth in 1971.)

I wonder how many other arguments we're having are secretly about definitions and misunderstandings (intentional or otherwise, on my part or yours)?

partial apology

Personally, I consider you to be disrespectful of the context which scoped my statement.

I apologize for inadvertently causing that perception. It's all a big muddle now because I think I did initially miss an aspect of your comments and thus didn't respond directly enough to them -- and I also think that while that (my fault) bug kicks up some mud... what I said was basically right and what you were defending... not so much right (in my opinion). But there is a bunch of sloppy discussion rather than a real conflict of ideas that brought that about and I'm good for at least half the responsibility for such. We imperfectly kicked around some ideas and I at least (and I suspect both of us) leave with more than we had on entry.

The main thing I want to say here and now is I'm sorry I seemed disrespectful to you because I'm pretty sure I'm not (not quite). I found what you wrote helpful to my thinking.

Accepted

I don't condemn you to being fully at fault. Miscommunication requires two parties, and I agree we've both been sloppy. I still disagree with your loose definition of 'laziness', and I still find your arguments in favor of fundamental physics-inflicted 'tree-based' syntax/calculation/computation to be inconsistent.

I agree we can develop Turing-complete languages without cyclic data (the functional subset of at least one language I've developed qualifies: total functional). The question in context is ratcheting the performance of such to support kernel-level performance. I don't consider regular ref counting to even be on the table as a viable option, which is why I did not talk about it until you forced the issue. Using regular (fine-grained) reference counting it's easy to avoid cycles with a sufficiently restricted language. Sandro mentions a few hybrid options that eliminate the most significant advantage of reference counting in order to improve performance. But region-level - which was clearly the context of the earlier discussion - is a bit harder: old pages may obtain pointers to newer pages when you use traditional lazy (call-by-need) evaluation, or if you use linearity optimizations, or if you use data-parallel evaluation, or if you use JIT compilation. If you're aiming for performance, you'll probably want at least one of these, maybe all of them. Laziness was mentioned as just one example of why you might want mutation. And all this is before considering concurrency.

You have been attempting to develop a programming model that will support acyclic memory to simplify garbage collection and improve 'convenience' by which you oddly mean 'performance' (with unknown metrics - performance properties break down into utilization, efficiency, latency, scalability, predictability; parallelism is part of utilization - using all the available CPU resources). If performance is your goal, then I hope I've given you a lot to think about.

I think we can leave it here, with respectful disagreement, much of which is probably due to stubborn terminological or philosophical issues.

Hadn't noticed this comment

On a smaller-scale, reference counts have terrible cache-coherency and efficiency and atomic-update issues, no matter where you store them.

Bacon et al's on-the-fly collector based on reference counting with sliding views does not mutate reference counts, but instead updates a thread-local buffer in a manner similar to "deferred reference counting". Performance-wise, it's almost competitive with standard JVM concurrent collectors, and is one of only a small handful of fully concurrent/on-the-fly collectors I know of.

Anyhow, my point is that I never did figure out how to get region-based management to work efficiently on its own, especially if I wish to keep expressiveness.

There are a lot of complex analysis problems surrounding region-based memory management, so a tracing/region hybrid seems workable to me at the moment, ie. tracing at the level of regions, not objects.

I'm still learning about this domain though, so perhaps the inference algorithm isn't sufficiently clever to correctly scope the lifetime of various regions, eg. inappropriately assigning program scope. Perhaps this is what you meant with regards to expressiveness, since guiding inference may require annotations.

Trickier than it looks

First, region assignment for effective GC is very very hard. Second, it gets tangled up in effects: we might now need different specializations of map for functions f that heap allocate vs. functions f that region allocate.

I'll take any reliable edge that I can get, and I have no objection to regions for many purposes. I'm also an engineer: I'm happy with complexity-sane solutions that exploit things like region allocation opportunistically.

But complexity is the mother of most evil in software systems, and there are a variety of other approaches in modern GC systems that don't seem to need this sort of complication.

In practical terms, I'm much more worried about the fact that GC'd languages generally require 3x as much heap as languages using explicit deallocation. This obviously exacerbates any paging issues, but it makes GC'd languages completely impractical for, say, a Google/Amazon/whatever server farm. They simply can't add more physical memory without exceeding a their $$ budget for power, and they are all looking for ways to cut the current power budget by a factor of four or more.

Lively Kernel

Re: comment-62084:

We're not far from a circumstance wherein a personal computer can get by running absolutely nothing at all other than a web browser that supports javascript.

A veiled reference to the latest project that Ingalls has been working on?

The veil you are lifting

is probably not Thomas's princess.

Thomas Lord has been working on his own sort of Lively Kernel project for years now. He posted a design document for it on LtU a few years ago, but it has been taken down. It was codenamed Flower.

Small correction

Similarly, bang for buck-wise, early unix workstations soundly beat late lisp machines. It started to become apparent that lisp performance could be somewhat OK (for some purposes) on "stock hardware", especially if it isn't charged with replacing the OS, and again: hardware constraints pushed a C kernel underneath lisp.

Actually, early workstations soundly beat late lisp machines in spite of the presence of a C kernel. Several things converged at more or less the same time:

  • a set of compilation, runtime, and GC techniques emerged that didn't favor tagged arithmetic
  • the reliance on non-commodity memory word sizes caught up with the various lisp machine implementations
  • Register allocation techniques improved, making the memory-memory architectures of the custom lisp engines progressively slower on a relative basis.

So what's actually surprising is that the introduction of a user/supervisor privilege separation, with the data motion costs that this entailed in the architectures of the day, didn't buy the custom machines another generation of life. But it didn't: the impact of register allocation and commodity memory (and therefore DMA) was that significant.

not how I recall it.

That's not how I recall it.

Non-lisp hardware, as I recall it, revved faster and could win some highly selective benchmarks but tagged architectures were by no means the losers other than being less funded to rev as quickly.

But it didn't: the impact of register allocation and commodity memory (and therefore DMA) was that significant.

I wouldn't quite put it that way. Prbly the wrong thread to hash it out at length though, if we're to stay in the neighborhood of topic? Commodity everything was the economic thing that dominated -- CPU, Memory, bus interface, ... which ones were picked was not so much narrowly tied to register allocation and DMA.

languages should provide os-like abstractions

I was thinking about your comment this morning, and came to a similar conclusion as Tom did -- that perhaps Ingalls was right in the sense of the language becoming the new operating system.

Of course with some library abstractions one can make a program oblivious to running on Windows or Unix, but I think the point goes deeper: that one should be able to implement abstractions in a language that look like operating system abstractions.

Specifically, I'm thinking of Kiselyov and Shan's Delimited continuations in operating systems.

Back from the Dead

This paper is very on topic here: Programming Languages as Operating Systems.

what language-as-os means to me

I understand "language as OS" to mean two things:

First, just as you say, OS features should be presented within the languages abstractions.

Second, is to have most of the environment (OS features and all) actually written in the language, in a portable way (like Smalltalk-80).

For example:

a) does your language offer a good "explanation" for asynchronous I/O? (Are the abstractions good?)

b) can you reasonably code a network stack, and a file system or other data base storage manager in it? (Is it fast enough?)

The smaller, simpler, and "cleaner" you can make the non-portable core part of the run-time system, the greater the liberties you likely afford yourself to run on non-standard hardware or in non-standard environments.

the Forth approach to operating systems

Re: comment-62077: “…Daniel Ingalls sounds like Chuck Moore.”

Turns out, this observation has been made before. The most promising Google hit on this topic seems to be The FORTH Approach to Operating Systems (1976) by Elizabeth D. Rather and Charles H. Moore. To lift a quote:

In 1973, Koudela (The Past, Present, and Future of Minicomputers: A Scenario) listed several desirable features of minicomputer software which can simplify the programmer's task:

  • Integration of system functions such as language processors, operating systems, utilities (e.g. for editing and debugging) and libraries into a single package through which the user can conversationally interact with the hardware;
  • Programmer interaction with source language only;
  • Transparent (to the user) translation of the source to machine language for high speed execution;

  • Ability to modify or extend such a system easily for specific applications.

FORTH is such a system, including in a single, integrated structure an interactive hlgh-level language, macro assembler, multlprogrammed OS and utilities, plus some other concepts such as reverse Polish Notation, stack organization and virtual memory for data storage.

Previously on LtU:

Mea Culpa

Turns out, this observation has been made before — I strongly suspect subconscious plagiarism...

I interpret these comments

I interpret these comments as an ideal of personal computing where every "user" becomes instead a master of their computing environment, i.e., a programmer/user. I would include Ted Nelson's "Computer Lib" philosophy in the same camp.

I still believe it's a beautiful and empowering ideal worth pursuing and promoting, although the current movement of computing to "the grid" far beyond individuals' detailed understanding and control is, by this criteria, many steps in opposite, "wrong" direction.

Isn't that from the issue w/the hot-air balloon on the cover?

I still have that issue somewhere...
That article inspired that same year to try and write a Smalltalk for the TRS-80 in assembler - quite a project that was never finished. It's not the bytecode interepreter or the parser that was necessarily the hard part - it was getting a VM in the "magic" state w/o any bootstrapping, including all attendant pre-existing instances, classes, meta-classes, etc., and all the relationships between them that were necessary before life could be breathed into the system for the first time. (Never mind that w/just the assember and 48K of RAM, it wasn't going to be easy, if even possible.) I could have tried it in C, but at the time, working with the C compiler (even on the newly released PC) meant running cpp, cc0, & cc1 seperately - and swapping floppy discs each time! I gave up on it, and did end up writing a FORTH and a Lisp, which took less time together then I'd spend just w/the prelimnaries for the ST.

BYTE Magazine, August 1981

Re: comment-62083: I still have that issue somewhere…

The hot-air balloon cover is reportedly by Robert Tinney. Scan it before it turns to dust.

shark-infested waters

Turns out, somebody did scan it already: Byte_1981_08_150dpi.pdf, 85MB, 171 pages, scanned in color at 150DPI. In Introducing the Smalltalk-80 System, Adele Goldberg gives us the scoop on the drawing of the cover:

The [Aug '78] issue was a special on the language Pascal, so [then editor Carl Helmers] took the opportunity to present Pascal's triangle as drawn by artist Robert Tinney. The primary allegory of the cover was the inversion of the Bermuda Triangle myth to show smooth waters within the area labeled “Pascals Triangle.” In explaining the allegory, Helmers guided the traveler through the FORTRAN Ocean, the BASIC Sea, around the Isle of BAL, and up the Land of Smalltalk:

Traveling upward (in the picture) through heavy seas we come to the pinnacle, a snow white island rising like an ivory tower out of the surrounding shark infested waters. Here we find the fantastic kingdom of Smalltalk, where great and magical things happen. But alas … the craggy aloofness of the kingdom of Smalltalk keeps it out the mainstream of things.

This month's cover … depicts the clouds clearing from around the kingdom of Smalltalk, and, with banners streaming, the Smalltalk system is taking flight in the mainstream of computer programming community. This cover was also executed by Robert Tinney…

Robert Tinney reminisces:

… in 1981, it was Smalltalk's turn to be featured on the cover of BYTE. Carl told me that since the Smalltalk developers had taken good-natured “offense” at their representation in the previous cover, this time they wanted Smalltalk to be shown at the moment of liberation from the ivory tower. I'm not positive, but I think it was the Smalltalk team itself who came up with the idea of using the image of a balloon. Anyway, I painted the same cliffs and ivory tower from the previous cover, only this time they are being left far below as Smalltalk escapes its cloistered image and ascends to the heights of popular appeal!

What's BAL?

Visionary languages

I also agree that a department on visionary languages would be a good thing. I'd add Forth, APL, and Lisp to the list along with Smalltalk.

certifiably visionary Alan Kay

Re: comment-62089: a department on visionary languages would be a good thing.

A Conversation with Alan Kay (ACM Queue vol. 2, no. 9, Dec/Jan 2004-05, PDF) comes highly recommended by a member of Queue's Editorial Advisory Board, Bryan Cantrill:

I read a few of the articles. I found them to be surprisingly useful — almost embarrassingly so. I sheepishly subscribed, and I found that even the articles that I disagreed with — like this interview with an apparently insane Alan Kay — were more thought-provoking than enraging.

The interview was conducted by Stuart Feldman, the original author of make, who is of the opinion that

… we have one big language and one smaller language every decade — that appears to be all the field can afford. Smalltalk is one of those five- or 10-year events.

Alan Kay says that Smalltalk is dead:

Perhaps it was commercialization in the 1980s that killed off the next expected new thing. Our plan and our hope was that the next generation of kids would come along and do something better than Smalltalk around 1984 or so.

…I would compare the Smalltalk stuff that we did in the ’70s with something like a Gothic cathedral. We had two ideas, really. One of them we got from Lisp: late binding. The other one was the idea of objects. Those gave us something a little bit like the (ogival) arch, so we were able to make complex, seemingly large structures out of very little material, but I wouldn’t put us much past the engineering of 1,000 years ago.

…actually both Lisp and Smalltalk were done in by the eight-bit microprocessor—it’s not because they’re eight-bit micros, it’s because the processor architectures were bad, and they just killed the dynamic languages. Today these languages run reasonably because even though the architectures are still bad, the level 2 caches are so large that some fraction of the things that need to work, work reasonably well inside the caches; so both Lisp and Smalltalk can do their things and are viable today. But both of them are quite obsolete, of course.

It’s well known that I tried to kill Smalltalk in the later ’70s.

…the problem is—I’ve said this about both Smalltalk and Lisp—they tend to eat their young. What I mean is that both Lisp and Smalltalk are really fabulous vehicles, because they have a meta-system. They have so many ways of dealing with problems that the early-binding languages don’t have, that it’s very, very difficult for people who like Lisp or Smalltalk to imagine anything else.

Basically what happened is this vehicle became more and more a programmer’s vehicle and less and less a children’s vehicle—the version that got put out, Smalltalk ’80, I don’t think it was ever programmed by a child. I don’t think it could have been programmed by a child because it had lost some of its amenities, even as it gained pragmatic power. So the death of Smalltalk in a way came as soon as it got recognized by real programmers as being something useful; they made it into more of their own image, and it started losing its nice end-user features.

Long live Squeak.

Perhaps, I should've just linked to the earlier discussion.

beyond Smalltalk

Re: comment-62482: “Alan Kay says that Smalltalk is dead.

In a comment on the The Myths of Object Orientation, Z-Bo quoted from an old post by Alan Kay on the Squeak mailing list:

… whereas at PARC we changed Smalltalk constantly, treating it always as a work in progress — when ST hit the larger world, it was pretty much taken as “something just to be learned,” as though it were Pascal or Algol. Smalltalk-80 never really was mutated into the next better versions of OOP. Given the current low state of programming in general, I think this is a real mistake.

I would suggest that more progress could be made if the smart and talented … would think more about what the next step in metaprogramming should be — how can we get great power, parsimony, AND security of meaning?

Note to self: Kay's post is part of a larger thread on prototypes vs. classes initiated by Steve Dekorte and extended by Patrick Logan, David N. Smith, and Dan Ingalls, among others.

future work

The same issue also features an introduction to The Smalltalk Environment by Larry Don't-Mode-Me-In Tesler who, as you recall, was a major influence on Vassili Bykov's work on Hopscotch. Tesler writes:

Most compilers can generate more efficient object code if types are declared. Existing implementations of Smalltalk cannot take advantage of type declarations. We expect that future versions will have that ability. At that time, type declarations may be added to the language. They probably will be supplied by the system rather the user, using a program-analysis technique called “type inference.”

Seems like Bracha has fulfilled some parts of this program. (Strongtalk and mixins in Newspeak come to mind.) Now that Ingalls and Bracha both work at SAP Labs (seemingly a place where many contrarian minds have found refuge) interesting things may come out of their occasional chats by the water cooler.