Are "jets" a good idea?

I've noticed the beginning of a trend in hobbyist/amateur programming language theory. There have been a number of recent programming languages that have extremely minimalistic semantics, to the point of being anemic. These languages typically define function types, product types, the unit type, perhaps natural numbers, and not much else. These languages are typically capable of universal computation, but executing programs in them directly would be comically slow, and so what they do is recognize special strings of code and replace them with implementations written in a faster meta-language. This pair of the recognized string of code and the efficient replacement is known as a "jet", or an "accelerator".

Some languages that do this:

  • Urbit's Nock, which, other than the jet concept, I honestly think is either a strange art project or snake oil, so I won't discuss this further.
  • Simplicity, a smart contract language for blockchain programming by Blockstream.
  • David Barbour's Awelon, which is the most minimalistic of all three of these by only defining function types!

My question is this: is this idea of "jets" or "accelerators" a serious approach to programming language implementation? There's an obvious question that comes to mind: how do you know the object language program and the meta-language program you've replaced it with are equivalent? Simplicity claims to have done a number of Coq proofs of equivalence, but I can't imagine that being sustainable given the difficulty of theorem proving and the small number of developers able to write such a proof. Urbit takes a rather cavalier attitude towards jet equivalence by advocating testing and simply treating the jet as the ground truth in the case of a conflict. And I know David posts here so I hope he will respond with Awelon's take on this matter.

Here is an excerpt from the Simplicity document arguing in favor of jets:

• Jets provide a formal specification of their behavior. The implementation
of a jet must produce output identical to the output that would be pro-
duced by the Simplicity expression being replaced. There is no possibility
for an ambiguous interpretation of what a jet computes.
• Jets cannot accidentally introduce new behavior or new side effects be-
cause they can only replicate the behavior of Simplicity expressions. To
add new behavior to Simplicity we must explicitly extend Simplicity (see
Section 4).
• Jets are transparent when it comes to reasoning about Simplicity expres-
sions. Jets are logically equal to the code they replace. Therefore, when
proving properties of one’s Simplicity code, jets can safely be ignored.

Naturally, we expect jetted expressions to have properties already proven and available; this will aid reasoning about Simplicity programs that make use of jets.

Because jets are transparent, the static analyses of resource costs are not affected by their existence. To encourage the use of jets, we anticipate discounts to be applied to the resource costs of programs that use jets based on the estimated savings of using jets.

When a suitably rich set of jets is available, we expect the bulk of the
computation specified by a Simplicity program to be made up of these jets, with only a few combinators used to combine the various jets. This should bring the computational requirements needed for Simplicity programs in line with existing blockchain languages. In light of this, one could consider Simplicity to be a family of languages, where each language is defined by a set of jets that provide computational elements tailored for its particular application.

In the interest of charity, I'll also try to make an argument in favor of this approach, although I remain skeptical: an extremely minimal programming language semantics means a language designer can truly consider their work to be done at some point, with no further room for improvement. A minimalistic Python would have largely avoided the fiasco involved in the upgrade from Python 2 to Python 3, by pushing all of the breaking changes to the library level, and allowing for more gradual adoption. Languages features added to JavaScript in recent years (like classes, async/await, modules and so on) would also have been libraries and presentation layer additions instead of breaking changes at the implementation level.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Jets are a good idea

To avoid double standards, do keep in mind what you're comparing jets against in actual practice: ffi, built-in functions, ad-hoc types and informally verified compiler optimizations or rewrite rules. Which of these things are clearly better when it comes to reasoning about equivalence?

Urbit's developer provides a decent response to your query along these lines in the "common objections to Urbit" document, partially replicated here:

The correct way to think about jets is not to compare them to a perfect interpreter implemented by unicorns. Jets are a combined replacement for two other techniques in language implementation: intrinsics and a foreign-function interface.

Compared to both intrinsics and an FFI, your jets have one key disadvantage: they need to be specified in pure code. You may not have this executable specification, which means you need to write it. Is this really so bad for your development process? An executable specification? Which can be trivially tested against the efficient implementation?

Compared to intrinsics, jets give you true separation between mechanism and policy; only the interpreter, not the programmer and not the compiler, needs to know what's accelerated.

Compared to an FFI, jets preclude I/O, so can be trivially sandboxed. Except that the programmer needs to mark routines which may benefit from optimization with a hint, here too we separate mechanism (efficient implementation) from policy.

Yes, formal verification of jet equivalence is desirable. However, in the real world, many specifications, including quite complex ones, are tested into equivalence by brute force.

That said, jets (accelerators) aren't free and shouldn't be introduced casually. Because each one is a significant implementation, integration, backwards compatibility, and portability burden.

For example, I have an option to accelerate lists by representing them as arrays. I can track sharing, copy on first write to shared arrays, and generally perform in place updates by accelerating the indexed update function on lists. If I do this, then code will hopefully be written that takes advantage, otherwise my efforts would have been wasted. But once code takes advantage, it cannot be ported (in practice) to any other runtime or compiler that lacks the acceleration, and removing the feature will hurt real human users. Meanwhile, if I reject the accelerator, programmers will likely develop some clever means that's almost as good, possibly even better for many use cases - such as an IntMap trie or a finger tree - and that doesn't require array accelerators.

Fortunately, it doesn't take many (well chosen) accelerators before we can generally match performance of conventional scripting languages.

IMO, an intriguing approach to accelerators is to target evaluation of DSLs that can operate under specialized constraints. We can target DSLs to GPGPU, FPGA, cloud computing, and so on. A safe deterministic subset of a C-like language (or OpenCL) targeting a CPU and a bounded memory buffer is no trouble at all and can mitigate need for many other accelerators. For example, we don't need to specifically accelerate secure hash functions or regular expression matching because those can be written in our C-like DSL.

Since we don't need many accelerators to cover most use cases, the verification burden can also be kept to a reasonable level. We only need to be cautious and considering in curation. Urbit has made some mistakes here, but we don't need to repeat them.

Broken Jets

I am a bit confused by this:

But once code takes advantage, it cannot be ported (in practice) to any other runtime or compiler that lacks the acceleration, and removing the feature will hurt real human users.

I thought the point of Jets was that you are replacing the inefficient code with the efficient implementation, so programs should still run without the acceleration, just more slowly. Is your point that this could make the program unusably slow?

Have you looked at generic higher order program transformation as a more general way to accelerate functional code?

generic_program_transformation.pdf

Performance regression

There's no point in a jet that offers only minor performance benefits or is obscure and rarely used. If we have a jet at all, it should be significant in its effect. People will soon be depending on this after it's widely available. Performance is allowed to progress, but performance regression is usually unacceptable. Even if the programs are still usable, just slower, you'll damage the confidence of your programmers. Your language won't survive more than a few such blows to its mind share.

Of course, this could be mitigated if you replace a jet by a more general one, especially if use of the jet is mostly hidden behind a library or two and you work with the developers of these libraries to cleanly transition. So jets can be replaced with some hassle, just not casually removed without replacement.

generic higher order program transformation as a more general way to accelerate functional code?

I know program transformation is useful for optimizing code. I've worked with general program rewriting languages like Maude and OBJ, and I've used GHC Haskell rewrite rules. For my Awelon language, I've spent some time contemplating support for user-definable transforms and fusions like `[F] map [G] map == [F G] map`.

But I think program transformation doesn't fill the same niche as jets, not unless taken to some awkward extremes. Jets effectively extend our set of "performance primitives", whereas program transforms work with an existing set of performance primitives.

Further, such program transforms and rewrites are more ad-hoc and fragile even compared to jets. For jets, we can be confident that performance is stable so long as we don't modify a well documented set of 'standard' definitions. For program rewrite transformations, any refactoring or abstraction, any change of context that causes another rewrite rule to locally be recognized first or prevents one from firing, can significantly affect performance. Unless you ensure rewrites are both confluent and penetrate abstraction layers (which has implications for separate compilation).

I think effective support for multi-stage programming is worth more. With staged programming, programmers can explicitly represent intermediate programs as data structures, transform them explicitly and deterministically by their own ad-hoc rules, then statically 'compile' them into host language functions or another accelerated DSL. This shoves responsibility for stable performance of transforms back onto our programmers and cleanly separates the optimization model from our host-language abstractions.

Fractally wrong non-arguments

Yarvin's response does not actually address any real objection, but let me deconstruct it piece-by-piece.

To start, I find it curious to conflate intrinsics and FFIs as if they address the same problem. Intrinsics provide a higher-level interface to platform-specific instructions without having to drop down to inline assembly. FFIs, especially in the context of higher-level languages, can also be used for acceleration by dropping down to a lower-level language, but another important benefit is simply leveraging preexisting code.

The correct way to think about jets is not to compare them to a perfect interpreter implemented by unicorns. Jets are a combined replacement for two other techniques in language implementation: intrinsics and a foreign-function interface.

How does glorified peephole optimization in anyway obviate the need for dropping down to intrinsics, or even asm, to take advantage of platform-specific acceleration? Haswell (as I recall) introduced a series of ops to accelerate bignums. Is Urbit going to detect when a programmer is implementing arbitrary-precision arithmetic and emit these ops? Yarvin's protestations aside, this is precisely the same "sufficiently smart compiler" snake oil we've seen peddled time and time again. I'll believe it when I see it.

Compared to both intrinsics and an FFI, your jets have one key disadvantage: they need to be specified in pure code.

So, in other words: "a key disadvantage of the jets is that they do not have the advantage of FFIs". That you can implement routines in the language instead of using an FFI is about as facile of an argument as the claim that all turing-complete language are equally expressive because they are capable of computing the same set of things.

Compared to intrinsics, jets give you true separation between mechanism and policy; only the interpreter, not the programmer and not the compiler, needs to know what's accelerated.

This is just fundamentally false. If we want our program to terminate before the heat-death of the universe we must be acutely aware of what will and will not be accelerated. In fact, the situation is far worse because the programmer must be aware of which jets are available and intimately familiar with the inner mechanics of the interpreter so as to write something which actually executes in a reasonable amount of time.

Re: fractally wrong

This "glorified peephole optimization" is useful largely because it doesn't involve "sufficiently smart" anything. It's utterly simplistic and it works. Platform specific optimizations are naturally left to the implementation of the jet, just as they would be for intrinsic functions.

Urbit would not detect arbitrary precision arithmetic in general. But mapping specific reference implementations for specific functions to a high performance library is not difficult. And that's essentially what jets do.

I agree that FFI has additional use for integrating external code. Jets clearly correspond only to performance-motivated applications of FFI. However, whether entangling external code is an advantage or a disadvantage depends on language goals (like mobile code, security, static analysis, and partial evaluation). IIRC, there is another question addressing this point for Urbit.

I also agree that programmers need to know what is accelerated in practice, and I said as much. I think even Yarvin would agree, and might clarify about this 'practical need' versus whatever he meant in the quote.

I disagree that this need for programmer awareness makes the situation significantly worse. Programmer assumptions should be documented, and the best documentation is testable. My language uses a simple annotation to say, "this has to be accelerated, if not please raise an error".

Show me evidence

This "glorified peephole optimization" is useful largely because it doesn't involve "sufficiently smart" anything. It's utterly simplistic and it works. Platform specific optimizations are naturally left to the implementation of the jet, just as they would be for intrinsic functions.

Where's the evidence of this? I'd like to see empirical numbers, for non-contrived benchmarks. The kind of jets which would be required to produce code at parity with your standard C/C++/Fortran/Ada etc are completely implausible. No one is going to be able to convert Nock, which is barely more than the SK combinators, into highly optimized, vectorized code with peephole optimization. If if were that simple, we would already being doing that with more expressive languages. This does happen to an extent, and yet, inline asm for really tight loops (say in video codecs) still isn't going away.

But more to the point, even if it were possible, what would be the point? What have you gained except massive complexity to support a completely unreadable, unmaintainable language.

I disagree that this need for programmer awareness makes the situation significantly worse. Programmer assumptions should be documented, and the best documentation is testable. My language uses a simple annotation to say, "this has to be accelerated, if not please raise an error".

If you have to write some sequence of code to achieve the desired effect, that sequence effectively constitutes a lexical unit in its own right, so then the question becomes...what on Earth is the point of having to manually specify each component anyway?

Fast jets

You seem to have the wrong impression about what jets/accelerators are doing. They won't turn arbitrary subprograms into vectorized code. But a jet could specifically recognize a "matrix multiplication" function and replace it with a highly vectorized implementation that leverages SSE or even GPGPU. More generally, we could accelerate an interpreter for a safe, pure subset of OpenCL or some other Khronos group spec, and thereby effectively embed OpenCL within the language. If you're interested in such techniques, the Haskell libraries "accelerate" and "GPipe" are of potential interest.

Jets accelerate specific function representations. Although the method of recognizing and replacing subprograms is similar to "peephole optimizations", the scales, conditions, and consequences are vastly different. It is much more useful to think about jets as intrinsics or performance-motivated FFI with a reference implementation. Having the reference implementation simplifies testing and security reasoning, and makes it a lot easier to preserve meaning in context of porting or versioning, multi-stage programming, partial evaluations, or mobile code.

I'm not certain what you're imagining at your last comment, because it seems too absurd to me. Perhaps a concrete example would help. For Awelon, I might have `foo = [accelerated function code](accel) inline` for each accelerated function. The annotation indicating acceleration requirements is simply `(accel)`. Annotations are flavored identity functions - they can support typechecking, debugging, optimizations, but are not observable within the program. By recognizing (accel) we can report when a required accelerator is not recognized, and simplify recognition of accelerated functions. But we still need to know which function we should be accelerating.

Also, jets aren't why Nock is "a completely unreadable, unmaintainable language". And the topic question is not whether Nock is good.

Nothing interesting

Jets are nothing more than a usual primitive, implemented in some host language, having a counterpart written in the target language, aka meta-circular fashion. You get the same thing in any language written in itself, when the execution is short-circuited to the (compiled?) host implementation, rather than infinitely regressing on itself :) But because it is literally impossible to compile their "specifications" into reasonable machine code, they are just left with annotating it.
Also, I'd like to say (and thats a big IMO), but you need to take anything that comes out of Urbit project with a big grain of salt. This re-inventing the wheel, while bamboozeling all usual definitions, is very sketchy. I was rather intriguied at the beginning and spent a lot of time studiyng the codebase. What is "marketed" as a complete reimagining of the process is nothing more but slapping strange names to the usual things.
Useless Nock VM? A glorified SK combinator interpreter + extra goodies strapped on for a good measure (basic lispy things aka isatom?, eq?, generalized ca/dr, "cons" and few macros to ease the pain)
Hoon? A "functional" counterpart of a brainfuck with mundane "object-orientedness" added in (of course, every term is renamed to appear more "revolutionizing")
Maybe I am too stupid and I am missing the point. I'll be glad if someone proves me wrong on this one.

implemented primitive?

I use the word 'primitive' to refer to an axiom of the language - assumed, not implemented. An implementation corresponds to a proof in the language, and is built in terms of primitive axioms. What is it you mean by 'usual primitive' such that the quoted sentence makes logical sense?

Jets are nothing more than a usual primitive, implemented in some host language

Anyhow, I agree that jets/accelerators are not very interesting.

If they were very interesting, they probably would be a bad idea. :D

Simple, reliable, uninteresting things are the best. At least when it comes to low level computation.

take anything that comes out of Urbit project with a big grain of salt

Nobody here is advocating Urbit. But that doesn't mean we can't learn from it. Urbit's developers have spent a lot of time envisioning and actualizing highly declarative networked systems.

Besides jets, another idea worthy of exploration from the Urbit project is networking based on public sharing of encrypted messages, a sort of content addressed network with possession of cryptographic keys as a declarative basis for 'identity', that can be optimized into point-to-point connectivity. I think this idea would be readily extended to publish-subscribe or with shared CRDTs.

Everyone gets every message?

You're talking about that setup where every node receives every message, and if a node is able to decrypt the message, then the message was meant for that node?

I found that concept even more confusing than jets. What is the purpose of doing things that way? I know the Bitmessage project does things in a similar way, but their goal is some sort of guarantee of anonymity (which, I'm told, is not even achieved).

Outside of possible privacy benefits, what reason would you have to make a global broadcast system like this? Why is something like a URL not a "declarative basis for identity"? Doesn't global broadcast break down with a large number of nodes?

Logically broadcast, physically point to point

Logically, every node receives every message then throws away those it cannot decode. In actuality, messages are delivered only to nodes that can decode them. And usually there is only one such node. This is an optimization in the effects layer. Since it isn't physically global broadcast, scaling isn't an issue. There are no privacy benefits.

The motivation for this is to achieve simple, declarative network semantics. No complications from socket life cycle, routing, etc.. No need for a separate concept of network object identity (no distinguishing multiple nodes with the same keys). Of course, network partitioning remains an essential issue.

URLs are relatively declarative except for the foundational parts controlled by assignment through ICANN or ISP. If network names were based simply on possession of cryptographic keys (like in Urbit) the foundation also becomes declarative. You can still benefit from using URL paths to model logical resources at finer granularity.

Sorry for the needless

Sorry for the needless exaggeration with that "usual" adjective.
I use the word 'primitive' to refer to an axiom of the language - assumed, not implemented
Primitives are, in a sense, axiomatic, but this is not their main property. You are, usually, defining everything else in a language using the fixed set of primitives, but this set is not fixed and can be arbitrary.
Take combinators, for instance. You can define any combinator in terms of any others, and them too in terms of others. It's just that sooner rather than later the circle closes and you end up defining something in terms of itself. In order to avoid that, an arbitrary (and hopefully turing complete) set is chosen and "pinned down" to be implemented as primitives. And they are primtive in a sense that they do their thing instantenously, without relying on their definitions. But not because they are somewhat more special than other combinators to be marked as axioms.
And they can be implemented according to their own definitions, provided you have the means to translate that definition to a "lower" level of a tower (sorry, there was a really good paper on that, if I'll ever find it in my mess I'll send you a message). And if you can't translate them, you can just "annotate/jet" them with a, hopefully, equivalent implementation.
So, "jetting" some procedure is nothing more than promoting arbitrary piece of code to a "primitive" status, not meaning that this is a new axiom for your language, but just that will do its thing in a "1 step" immediate fashion, relative to the current interpreter.

Theory of concatenative combinators

I suspect you're referring to Brent Kerby's paper (link).

Operations marked as axioms in a language are indeed special, in context of that language, as a direct consequence of being so marked. But we can distinguish language primitives from performance primitives - operations we axiomatically assume are "fast".

Hopefully all the language primitives are also performance primitives. Jets provide an additional set of subprograms as performance primitives.

Aside: I would not generally assume that jets are implemented atomically, in terms of big step semantics. They only need to be a lot faster than a naive small step implementation, which implies having fewer intermediate states. For example, accelerated multiplication of very large numbers could strategically break the numbers into word size pieces to multiply and add. Accelerated evaluation of Kahn Process Network descriptions (to leverage distributed CPUs and memory) would certainly not be atomic. Of course, observing intermediate states requires a debugger or other external perspective.

Same idea without jets

I pursued a similar strategy in an earlier project which specified low-level routines as finite state machines. Known macros were replaced with C code on a source text matching basis. It works but then why not just write all functions in C? The temptation is too great. Clearly Urbit has some very high-level jets now.

http://zanopha.com/docs/ustar.pdf

with more context at

http://zanopha.com/research.html

In response to Nock I formulated something very similar without the need for low-level hacks like this.

https://github.com/tibru/tibru

Spoiler - it uses look up tables to implement byte level arithmetic. THese are specified in user code not in the language. This is O(n) but bounded by the size of a byte so really O(1). With an optimization in the runtime it can be made proper O(1). Larger numbers can be built up from bytes. It will have the same runtime performance as hardware with a larger (but okay) constant factor. The optimization is not actually implemented in the code but I see no reason that it can't be added.

No.

Once the obfuscated language is stripped away, one quickly realizes that so-called "jets" are nothing but a particular form of peephole optimization. These languages effectively mandate the usage of jets since any implementation without them would be apocalyptically slow. This means that the actual semantics of the language extend far beyond the putative specification, inasmuch as programmers for such a language will rely on these optimizations taking place.

Ultra-minimalist languages remove information about programmer intent which would otherwise be available to the compiler to make use of for optimization. The "jet" method (I'm rather loath to even use the term) means that any optimization must necessarily be pessimistic so as to preserve semantics: in short what you've done is force programmers to massively over-specify the program.

I think a good analogy is between those languages which offer array bounds checking, and those which do not. Since bounds-checks are embedded in the semantics of languages such as Ada, Pascal, etc, compilers for these languages can aggressively remove bounds where statically possible. Conversely, in a language like C, a much weaker form of bounds-checking elimination is possible: the compiler can attempt to recognize certain bounds-checking idioms, and remove them where it can be statically proven that they are unnecessary. But what is the point of requiring all kinds of compiler heroics when the former accomplishes the same thing, in a simpler manner, and more effectively?

Jets solve a "problem" which does not exist.

Maybe

You're right that any tools that manipulate core language fragments have to be aware of the jets or risk breaking them. You still get the benefit of being able to consume core fragments without understanding jets.

I think your example of C isn't too fair since the type safety issues in C make it harder for a compiler prove things (like array bound safety) that could be relatively simple to prove in another language.

I don't think jets are a good idea because I think explicit representation / implementation selection is more powerful / better, but the approach seems to offer some advantages over the status quo.

What advantages?

You're right, it's not a perfect analogue. But as an illustrative example, I think it is sufficient. The point is that jets create a particularly vicious abstraction inversion whereby a programmer must simultaneously think inside the object language and a kind of metalanguage. Design decisions almost always involve trade offs, but I am struggling to see any benefit whatsoever:

  • Do jets makes the programmer's life easier? Certainly not! You tell me whether x = y + 5 or x = succ(succ(succ(succ(succ(y))))) is easier to read.
  • Do jets make the compiler writer's life easier? Again, no. Any reasonable implementation requires detecting a host of complicated (and potentially interacting) idioms, which generally must be pessimistic.

I have seen several optimizing compilers for Brainf**k written through the years, which can, for instance, combine series of increments and decrements into a single instruction: effectively a "jet". The difference between BF and jet-based languages is that at least the former is aware it's a joke.

well

It just makes the semantics simpler. That's pretty much it. And the initial language can focus entirely on semantics and let optimizations come later / after the fact. See David's comments about targeting GPUs, etc. Isn't this pretty much the same idea behind asm.js?

Simpler for whom?

It only makes the semantics "simpler" by selectively redefining what semantics means. If tail-call elimination were removed from Scheme, many if not most Scheme programs would simply cease to work. That the locus of tail-call elimination is the compiler, as opposed to a lexical construct, does not make it any less a part of of the semantics of Scheme, the language. A jetted languages sans jets would not work, for any meaningful understanding of "work." It's specious to artificially exclude them from the semantics of the language, taken holistically.

Sure, I meant value

Sure, I meant value semantics, but jets do affect operational semantics. So the simplicity is letting you understand the one without the other.

I'm not so sure of even that

I think that one is making a compositional fallacy in proffering this argument.

Individual opcodes in an instruction set are easy to understand, exceedingly so. ADD r1, r2, r3 adds the value of register 1 and register 2 and places the result in register 3. Does this make actual assembly language programs easier to understand than Pascal?

I don't really follow this.

I don't really follow this. AFAICT, a jet is essentially an intrinsic function that comes equipped with a reference implementation. This obviously isn't anything revolutionary, but seems like a reasonable improvement over black box intrinsics that don't have a reference implementation.

accelerators and annotations

I prefer the term "accelerator" rather than "jet".

Accelerators taken alone do present some problems as you describe. For Awelon, I have annotations to solve these problems. If accelerators are glorified peephole optimizations, then annotations are glorified identity functions.

Annotations cannot affect observations within the computation. But they serve useful roles in debugging, controlling evaluation strategy, selecting representations, guiding static analysis, enforcing multi-stage programming, and so on. The "programmer intent" lost to minimalist semantics is recovered by annotations.

Trivial differences

Suppose there are multiple runtimes for a minimalist, accelerated language (let's call it "ABCD").

The authors of each runtime do not communicate, but they make similar decisions: natural numbers, products and coproducts should be accelerated as they are some of the fundamental building blocks of large programs.

However, the authors write their implementations in very slightly different ways, and each accelerates their own.

How is this sort of difference reconciled in the larger ABCD community? I don't really see a way to do it. There's no way you can, in general, force every present and future author of an ABCD runtime to agree on one particular way to write something like a library for natural numbers. And even extremely trivial differences, like having a function that takes its arguments as

foo bar

instead of

bar foo

are intractable and will break the recognition of accelerators.

It would be really awful to try and form some kind of standards body for accelerators, with all the problems of groups like the W3C except they're reduced to specifying trivial programs in a fundamentally unexecutable object language.

But suppose you managed to do this, and got everyone to agree to write their natural numbers in one particular way. The question becomes: what is the point of passing around this one implementation of natural numbers? No one will ever write another, since everyone wants it to be accelerated 100% of the time. It's effectively a new syntactic construct. Any time someone new comes in and begins the evil, no-good, very-bad process of writing their own library, the other members of the community shout at them repeatedly to use the one blessed implementation. At this point, it may as well have been a primitive in the ABCD virtual machine.

eventual standardization

We don't need to force runtime authors to agree. Here's what will likely happen instead: Runtime authors adopt each others' better ideas, marginalizing accelerators as competitive advantages. Market pressures somewhat arbitrarily kill some runtimes and promote others until there is a clear dominant victor. New development efforts tend to occur in the dominant runtime, solidifying its methods. Other runtimes must become more like the dominant one to use new libraries and survive. De-facto standards are thus established. Eventually, ABCD communities get together at "A Big Conference of Developers 20xx" and document these de-facto standards - including standard accelerators - and give the least controversial subset of them a 'community approved' status, and encourage runtime developers to support 'switches' to operate using just the standard accelerators. Most runtime developers somewhat grudgingly comply. Translating code between runtimes becomes not much more difficult than translating code between different versions of Haskell or Scheme, and a lot of code works unmodified.

Compared to conventional languages, use of accelerators and annotations instead of more ad-hoc language extensions doesn't much change the overall story. It just replaces a few of the details.

what is the point of passing around this one implementation of natural numbers? No one will ever write another

With eventual standardization, accelerators and annotations and even some libraries effectively become 'primitives' of the language. But standardization isn't a process we should casually skip, nor is it one that stops before a language dies - it just moves on from numbers to matrices to GPGPU computing, from (par) to KPN-based cloud computing, and so on. Even after features have stabilized and ossified, it's convenient to retain executable specifications about precisely what every feature means for purpose of analysis, study, and whole-system testing. Having the code in the codebase means we'll also have old versions in old codebases, which provides a clear historical context for software archaeologists.

Also, keeping the definitions is cheap. So why not?

Proving a product accelerator

In a language like our hypothetical ABCD, a product is a block whose argument is, itself, a block. When our product block is invoked on another block, it invokes the argument block on the two elements of the product.

Suppose we've written a word `pair` that we claim is able to construct products that behave like this. Then it seems easy to verify: the runtime can simply invoke `pair` on two special sentinel values that are impossible to forge from within ABCD. Then, it can invoke the resulting block on a special block that is unrepresentable in ABCD, and is able to inspect its arguments. The runtime can then verify that the special block received the two sentinel values as arguments, and that computation halted straight after. You could even do this at the level of source code strings, making the sentinel values blocks with garbage data like [dFsgJdkd3jdx], relying on the fact that free variables simply get passed along during evaluation. Then just assert that a source code string where you inject the definitions of `pair` and `unpair`, or `pair` and `fst` and so on has the correct behavior.

Doesn't this prove that the user-supplied product API is correct? And if this works, I imagine you could prove the coproduct and potentially natural number APIs as well (a coproduct is a block that receives two blocks and invokes either one or the other on its contained value; a natural number is a block that receives a block and another argument and applies that block N times to the argument).

I guess I'm missing something, since this seemed really easy and I'm wondering why Urbit doesn't mention a scheme like this anywhere. Assuming it works it seems to greatly diminish the "trivial difference" danger I highlighted above since you can quickly migrate between accelerated APIs after proving them equivalent. In fact, if the trivially different APIs all end up creating blocks that behave in the same way, couldn't you just accelerate all of them at once and solve the problem once and for all?

Testing isn't proof.

Testing can highlight mistakes or resist regressions, but malice can still slip by quite easily. I could create an accelerated `pair` that works on your garbage data but fails whenever the first argument is 42. Or perhaps that works for the first billion pairs, then fails.

Also, small differences between runtimes can add up quickly.

Consider the encoding for natural numbers. You suggested a Church encoding, ((a->a)->a->a). But you could add short circuiting to that: ((a -> (a+r)) -> a -> (a+r)). Or you could use a recursive type encoding `Nat = uN.(1+N)` corresponding to the `type Nat = Z | S Nat` and operate on it via fixpoint combinator.

For most purposes, these differences could be considered trivial. It won't affect arithmetic. You can write a repeat function in each case. You can translate between them without loss of information. But combinatorial factors will swiftly get away from us. I would not count on automatic conversions between communities becoming a thing.

IIRC, Urbit simply centralizes a lot of the early decision making. Yarvin has a whole vision of how the Urbit system should mature and upgrade. A benevolent dictator is certainly one simple way of avoiding the variant runtime issues.

Examples of malicious blocks?

How would you create such a `pair`? I don't see how you'd be able to check what arguments it's applied to in the `[abcd]` basis. In order for it to check if an argument is `42` it'd have to call some sort of predicate, and that predicate would just stop evaluating if the argument didn't have the expected shape. There isn't anything like Lisp's tag predicates in `[abcd]`. I'd love to see an example function that can have the expected behavior on some arguments but not others.

I know that testing isn't proof in general, but it seems to me that for a fully parametric API like products and coproducts, you can in fact exhaustively test them and prove them correct. Just like how a function of type `forall a. a -> a` must be the identity function, or a function of type `forall a b. (a * b) -> (b * a)` must be a swap due to parametricity.

As far as Church vs. other encodings, I probably didn't explain my position clearly enough. I'm proposing a different approach to the way runtimes accelerate certain classes of functions: instead of saying "use this specific function and we'll make it run fast", they say "use this specific encoding and we'll make it run fast". The advantage is that we don't have to write our functions in literally the exact same way. As long as you can implement the API for a Church encoded type then it'll be accelerated, even if you use different data plumbing internally. It isn't a fully general solution to the problem of acceleration, but for trivial (e.g. fundamental) types like products and coproducts it strikes me as an improvement.

Ah.

I think I misunderstood you. You're proving that the accelerated function is the one expected by the runtime, not proving that the runtime implementation has the specified behavior.

Well, with pair, you could get by using opaque error values in testing. But that won't generalize to accelerators that naturally must observe values, such as the add function on two natural numbers. Proving your add function correct by testing is infeasible.

I think most accelerators will make some observations on their inputs. It doesn't seem to be worth optimizing for the rare exception that is pure data plumbing.

Just want fundamental types

Yeah, there's no way to prove arithmetic correct, and the vast majority of functions you'd want to accelerate won't work with this scheme.

But my thinking is this: accelerated products and coproducts alone might push something like `[abcd]` in to acceptable performance with a naive bytecode interpreter, reducing the need for accelerators in general for people who are uncomfortable with them. If you can accelerate those types "for free" then it removes one of the main complaints about accelerators, which is that without them your programs "won't finish before the heat death of the universe". An `[abcd]` runtime could be competitive with, say, Ruby, with zero accelerators beyond the free accelerators you can get from products, coproducts, and any other parametric types made out of products and coproducts (such as lists, which are just `mu L. ((a * L) + 1)`).

Simpler approach

If the benefits are sufficient, sure. You could specially recognize pairs and sums. You might double the efficiency compared to a naive implementation.

Rather than ad-hoc testing, I suggest a static type analysis to recognize pure data plumbing accelerators like pair, sum constructors, simple data plumbing, inline, and fixpoint. Ultimately this is almost the same idea just using type variables instead of opaque data. But it feels less arbitrary to me.

type classes

I think this is all related to an equivalent issue with type classes. With a type class, one provides two kinds of methods: the "axioms" which have to be specified in instances, and the "lemmas" dependent on them which cannot be. Sometimes you have a lemma which is an "axiom" with a "default" definition which can optionally be overridden in an instance.

This is a direct analogue of virtual methods in a C++ class with definitions which can optionally be overridden.

In reality the actual problem is more complex. Any system has a collection of alternative bases: there's more than one axiomatisation for the same set of theorems. In a real system, the bases do NOT consist of independent axioms, but a set of related lemmas. For a class, a simple but non-independent set of overridable methods are provided, and now the instance provider has a real problem: how to ensure a suitable spanning set is provided? And if overspecified, how to ensure the supplied set is consistent.

What I am saying here is that most languages *fail* to actually check a spanning set is provided. For example a total order could be specified with both less than and greater than overridable, and each defined in terms of the other. In an instance you can specify one, the other, or both (if both, the definitions have to be consistent), but if you forget and specify neither you get an infinite loop at run time. What you have a is a set of methods with definitions that have dependencies, and you have to remove the cycles.

Its worse. You could have more than one definition of a method, with different dependencies. So the system should choose one that eliminates cycles.

I have yet to see a language that does this instead of stupidly fixing the base set of axioms to be independent, thereby defeating optimisation.

And that's the problem with "jets". We have an axiomatic core and optimisations built on top, to provide a bunch of simple "lemmas". What we REALLY wanted was a richer, over-specified base language, which the compiler can automatically optimise, and for which better compilers can be written that do better optimisations. In other words, the method of a simple minimal axiomatic core with bolt on lemmas is bad: its what mathematicians do, not what programmers need.

Finding a language which is "just rich enough" is what type theorists do. The big problem is lifting this out of the compiler and putting it into the library. That means you need a meta-language which is strong enough to represent the proofs of the lemmas in terms of other lemmas, because those proofs ARE the optimisations. The compiler can do basic stuff, we need the user to be able to do it as well. At the moment, the user does it by hand optimising an algorithm, which means refactoring combinations so they're no longer combinatorial and are thus fragile.

Behavioral OBJ

For a language that is defined in terms of refining modules of axioms via implementation rewrite strategies, you might study BOBJ or to a lesser degree Maude (with suitable extensions). This is the closest I've encountered to what you describe.

I've studied and used such languages. Then I decided to use accelerators aka "jets".

You call jets stupid. "If it looks stupid but it works, it isn't stupid." -- meme, origin unknown.

Jets are simple, predictable, and work nicely both compiled and interpreted. They also work nicely with simple languages. I believe these features are each worth a lot and shouldn't be dismissed casually.

Although it is feasible to develop programs in terms of axioms and lemmas and user defined optimizations, it requires a more sophisticated language and compiler, is not very interpreter friendly, and does not lead to stable performance or behavior without a lot of careful development and consideration.

Regarding "defeated" optimizations, obviously we cannot generally rewrite `fmap f . fmap g` to `fmap (f . g)` without some context of axioms to argue it safe.

However, I think this kind of syntactic optimization doesn't account for much benefit in practice. First, it's fragile under modularization, abstraction, or separate compilation. Second, for best performance, generic code must be optimized *after* specialization to a data type. Effective support for multi-stage programming seems a simpler language design investment for similar performance benefits (just need a statically enforced separation of now-values versus later-values).

This Resonates

This resonates with me. Despite my limited hardware background, I've thought about it as "Design a CISC, implement a RISC." That is, since our implementation tools are inadequate for untangling and consistency-checking such a spanning tree of mutually interdependent implementation fragments, we need to implement small kernels and build on top of them via procedural abstraction. As interesting as small axiomatic systems are, nobody wants to program them. You have to expend significant design effort on the larger, user-facing interface. Keeping the core small is only a goal in that it minimizes work and risk to satisfy the full interface. The cost to growing the core is more code to write, maintain, and keep consistent. But the cost to keeping the core too small is that you lose significant high-level knowledge from the full interface that is useful for performance, debugging, etc.

Can give wildly varying results without feedback

J has long identified common phrases and replaced them with "special code," often resulting in 2 or 3 orders of magnitude speed-up. I never heard the term jet before, but this is clearly not new and is being used in at least one serious language. The downside to what the J system does is that there are multiple, equivalent ways of writing short runs of code, and one phrasing might be 10,000x slower than the other with no feedback. You have to be aware of what's going on under the hood.