U, a small model

U is a small (< 1000 words) computational model:


U is different because it:

  • has no user-defined functions or lambda
  • has a built-in hyper-Turing operator
  • expects an unreliable packet network

I have to apologize for posting this weird beast with no real context. I'm working on higher layers that should make its purpose clearer, but they are not done, and we all hate hand-waving. Still, the U spec is self-contained, and it should be precise and readable. If anyone can look at it and maybe even find some gotchas, I'd be super grateful.

I'm especially curious about the follow operator. Are there any logical flaws in its definition, or ways it could be improved, generalized or tightened?

Also, if anyone knows of any other UDP-level functional languages, or anything else U reminds them of, I would of course be happy to hear it.

(curtis dot yarvin, at gmail)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


I think a tad more context would be indeed useful.

Huh indeed...

Briefly, there are really just a couple of ideas in U.

One is to see if we can avoid assuming that concepts like variables, functions and types are essential parts of a computational model, just because they are essential parts of a programming language. A lot of the math behind PL theory originated when "computer" was a job description, and some programmers seem to find it difficult to learn. FP in the real world seems to be most popular in the form of Web template languages, so U models computing as recursive template expansion. It certainly takes a little syntactic sugar to build a usable programming environment on top of this, but not as much as one might think. And the sugar can, of course, be defined within the environment, because anything that can compute can parse.

Two is that FP and the Internet actually go together really well, because you can think of the semantics of any packet networking server as a single function, whose input is the current state of the server and the latest packet in (a packet is just a large unsigned integer), and whose output is a state change and a list of packets out. Because the Internet is an unreliable (best-effort) network, you can reorder packets for any reason whatsoever, even if the change is semantically nontransparent, and drop them when you have any kind of a computational problem. This is not a hack - it is actually what best-effort networking was designed for.

And your packet function can have hyper-Turing semantics, because you can always write a partial implementation of a hyper-Turing function - eg, you can write an algorithm that solves the halting problem, as long as one of its return values is "I don't know whether this halts or not." This allows you to standardize formal HT semantics as part of a packet protocol definition - if you can't compute the function, just drop the packet. One way to think of an HT function is as a formalization of trust; the easiest way to implement HT semantics is to just have a list of known cases, and trust is when you return a value that you haven't actually computed.

Sadly, these days the phrase "programming language" has come to mean "programming language for making arbitrary modifications to the state of a Unix kernel." Restricting a programming environment to sending and receiving UDP packets (the only user-level access most of us have to true packet networking) might seem rather, um, restrictive. But if you think about it, there's a lot you can do with a stateful UDP server, especially one whose semantics are reasonably formal. After all, anything you can do with IP, you can do with UDP.

Of course, a stateful UDP server will just degenerate into yet another side effect machine if the packet function is just passed the current state as an argument. The trick is not to do this, but instead to construct a referentially transparent, immutable namespace that hides the implementation details of the state. (For example, the state is $a in most of the U rules.) In fact, if you add a layer of metacircularity, you can turn this into a referentially transparent distributed namespace, because (thanks to packet reordering) your lower-level interpreter can send a request and receive the response before the higher-level packet that actually needs the resource can use it...

Anyway. I am waving my hands and I promised I wouldn't do that. Suffice it to say that there is more stuff coming down the pike, so if anyone finds the U spec cryptic or inexplicable, they are, well, absolutely right, and I apologize for trying their patience. Frankly, the main reason I posted this thing today is that these days you can't be too proactive about getting stuff into the public domain :-(

(curtis dot yarvin, at gmail)

only skimmed it, so no comments

Sadly, these days the phrase "programming language" has come to mean "programming language for making arbitrary modifications to the state of a Unix kernel."

But, I liked this quote.

even skimming gets you points!

Thanks :-)

To expand slightly on the quote, I wasn't just trying to be clever.

I grew up on Unix and still love it dearly, and I really do not see how anyone could write a better pure FP language for Unix programming than Haskell. But I still wonder if Haskell isn't pretty much as successful as it will ever be.

Unix pretends to be language-independent. But it really has a native language, and that language is C. Other imperative languages are fine for high-level, light-duty work, but even C++ smells a little wrong to anyone who learned to code on a 3B1. And of course you can wrap system calls in a monad - but I find the idea that monads are easier to understand than direct imperative programming a little strong. Pure FP is wonderful, or course, but pitting it against imperative languages in a natively imperative OS may just not be a fair fight.

I don't have as much experience with it, but I suspect the same is true of, say, CLR and C#.

So here is a question (which perhaps would do better in its own thread), for pure FP experts, from a self-confessed Unix mossback and incorrigible kernelhead: what would an OS that was actually designed for pure FP look like? For fun, let's say you can redesign the hardware, too...

the optional implicit parens

the optional implicit parens around nested pairs makes matching confusing - maybe it should be left as sugar for after the model.

on the other hand, replacing the decimal codes in the patterns with meaningful names would make the semantics clearer.

it wasn't clear to me if the codes in the rules are the same as number terms, or are specific to the rules (ie like %, $ etc). if the former, that seems like a mess - numbers have a relationship to each other (+) that rules don't have.

(and that's as far as i got, sorry)

good comments

Optional implicit parens: to clarify the question for others, all this means is that you can write (a (b (c d))) as (a b c d). At the very least, I should probably state this more explicitly.

Numbers in rules (eg, 5): no, these are just numbers. Replacing them with names (eg, by replacing "number, pair, foo" with "number, pair, foo, symbol") could certainly be done. And this is certainly how it is done, of course, in Lisp and other code-as-data environments.

The only reason U doesn't have symbols is that it doesn't need them. Why so minimalistic? Really just because I'm a networking and OS guy, and that is how we do it. But adding another kind to what is supposed to be an untyped layer adds a case to every routine that manipulates terms generically. Sometimes this is trivial, and sometimes not. For example, if you are defining a total order on terms, the complexity of this is O(n^2) in the number of kinds of term.


ok, so looking at it again,

ok, so looking at it again, you seem to have a state machine in the patterns. but i bet you no-one (certainly not me!) is going to try working out what they that machine is trying to do. it *really* needs to be more user-friendly. at the moment reading the document is more like trying to fathom a language like brainf*ck; i presume you'd rather that it said something (possibly) interesting about non-termination...

by using numbers rather than separate symbols you mix operations with numeric calculations.

the fact that rules need to be ordered suggests that you could find a more minimal/orthogonal approach.

have you looked at the recent link here on rho calculus? it's possible that is relevant (it appears to be connected with formalising pattern matching).

rule (s) seems to imply an infinite loop; (r) too?

actually, I'm amazed and impressed

that anyone even tried to understand the rules! Trust me, I wrote this damn thing, and it took me more than a few minutes to figure out how to use it.

Part of the problem is that the U spec is written like a protocol spec, not like a PL paper or design document. I genuinely believe that you could hand this spec to the average software engineer at, say, a cell phone OEM, and he or she could write a correct implementation. Without, of course, understanding it at all. But believe it or not, a lot of the industry actually works that way.

So people like me are used to scanning specs for salient details without really understanding their motivation. I find it hard to blame anyone else for not sharing this peculiar taste.

Actually, U is not a state machine - it is a stateless function. But it is meant to be used in a state machine, which is probably what you meant.

I am very comfortable having unsigned integers as a built-in feature. Again, this reflects my generally amathematical aesthetic.

The pattern match is ordered because that seems to minimize the number of operators and constraints. For example, with an atomic variable constraint you can get rid of a lot of the ordering. But ordering, again from my mathematically naive perspective, felt cleaner. Since the rule pseudocode is not a user-level feature, and is inherently informal (every formal system has to be defined in English at some level), adding formal firepower strikes me as overkill.

Yes, (r) and (s) both [in my terminology] choke. The simplest term that completes is a triple ($a $b $c). Or as I prefer to think of it, ($state $verb $noun)...



I have at best skimmed the document, to me it seems like it redefines a lot of standard terminology. For example - a computation either chokes or completes, whereas I would expect it to diverge or converge. My impression is that you call values for terms - why not just call them values? You define what a number and a pair is, whereas you only mention that the foo is unique.


Sorry, I am not a PL researcher and it shows :-(

Basically, U is the lowest level of a layered network protocol spec. So it is written for more the RFC audience, which does not assume any knowledge of PL theory. Also, terms like "choke" and "complete" are likely to reappear at multiple levels, making them effectively part of the UI in even high-level languages. (There are already enough really good languages to meet PL researchers' needs. Even if there weren't, I am certainly not the right guy to write one.)

U doesn't define any operators besides equality on foo, which effectively makes it featureless. Again, I figure a less PL-adept audience is more likely to just assume that foo equals foo.


Combinator calculus

My first impression is that U is a combinator calculus, except for * and > operators. The best known of the combinator calculi is SK combinator calculus, which you should have no problem googling for. It can be described very briefly, in your notation, as follows:

   [name]   [pattern]            [definition]
    (I)      (I $a)               $a
    (K)      (K $a $b)            $b
    (S)      (S $a $b $c)         ($a $c ($b $c))

Keep in mind that these bind to the left, so (K $a $b) is equivalent to ((K $a) $b).


Yes, although I think of combinators as constructing functions. Terms are just data - think of them as like XML.

Also, * and > are a big exception! For example, * is the only thing in U that really creates any garbage. And of course > is undecidable.

I have not really worked with combinator models, but my general impression is that it takes essentially an infinite amount of syntactic sugar to turn them into a programming language. U certainly takes some sweetener, but not, I think, as much.