Popr Tutorial: Dot Machines

I have been working on a series of tutorial articles for my language, Popr.

For the first article, I designed a notation to present the semantics of the language in a way that I hope is easy to understand, and yet similar to how it is implemented in the compiler.

Concatenative languages, such as Popr, can be very terse. While this conciseness can make code both easier to read and to write, it can also make the same tasks difficult for those without a good understanding of the language. Furthermore, without the right semantic model, it is possible to develop a false understanding of the language that works for simple programs while leaving other programs as a mystery.

For this reason, I have developed a graphical notation for Popr that, while impractical for larger programs, can help develop an intuition for how the language works. In this notation, we will build machines that consume and produce dots.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Amb

I'm also developing a concatenative, functional PL. At one point I had considered a built-in ambiguity feature, similar to your `|`. My motive was to allow the sort of ambiguity we encounter in natural language dictionaries: a word is given multiple meanings, then we resolve the ambiguity based on context of use (such as types).

I eventually decided this against this. Although convenient for expression, I found it difficult to reason about both meaning and performance of programs, especially in context of loops. We expect natural language would resolve ambiguity once for the whole loop (which is closer to type-driven overloading), whereas an amb operation preserves and resolves the ambiguity for each loop. Type reasoning and parametric polymorphism is complicated because `|` does not compose types as an algebraic sum. I felt it "worked" okay for small scale toy problems (albeit, of dubious utility even there), but I'm not convinced it scales to large applications.

Each choice is only resolved once

Each choice is only resolved once, so for example:

 : 1 2 | dup +
   2
   4

The results 2 and 4 are valid, but 3 is not, because the language must commit to a choice of 1 or 2.

So, if you want to make a choice once in a recursive function, the choice can be passed as an input, such as:

 : 3 4 | fact
   6
   24

Noting that fact is the typical recursively defined factorial function.

This is addressed in the tutorial by making choices before a machine is run, so that one machine is split into multiple versions. This can be done conceptually for a recursive function by unrolling the function.

Furthermore, the current design of PoprC limits the scope of each choice to a single iteration, if a choice is made within a recursive function, because each recursive call must be a join point. This means that the programmer has to ensure that there is only one valid choice before making a recursive call. This ensures that at any point, the number of valid branches are constant, regardless of the input, preventing combinatorial explosion.

Re: resolved once

I understood that each `|` is a committed choice. But that isn't sufficient to avoid combinatorial explosion. Contemplate examples like `[2 * 5 |] dup . dup .` instead.

Also, you seem to be saying that you're doing something special with recursive calls. I think relying on special semantics from the name layer is awkward. It shouldn't be difficult to express anonymous recursion in your language, e.g. the Z-combinator. I'll give it a try later, when I'm not on my phone.

Popr operator (modularity and bugs)

Your language's eponymous `popr` function has an unusual semantics: `[A [B]] popr => [A][B]`.

I had considered and rejected this feature for my Awelon language. I called it "extract". My motive was mostly in context of partial evaluation. That is, we can view `[A [B]]` as a partially evaluated expression, and we would extract from it a partial result. However, upon consideration, I believe this to be an unnecessary violation of modularity. It's convenient to understand a function as being defined by behavior rather than representation.

Introducing a primitive operator like popr requires us to understand representations and internal arities for every function. The relevant question becomes "how many `pushl` operations are required before I can safely `popr`?". Where we must understand representation, it seems wiser to make it explicit - for example, to require `[[B][A]]` instead of `[B[A]]`. This way, we only need careful attention for a well defined subset of functions, and representation can be managed by static type analysis.

ASIDE: your popr operator also has bugs! For a repro, consider the following session:

: [* 5] popr
 [*] 5
: :def foo: * 5
: 3 4 foo
 12 5
: [foo] popr
: [3 foo] popr
: [3 4 foo] popr
 [3 4 *] 5

My intuition is that if I define `foo` to `* 5` then it must have the same behavior as `* 5` under all observers, including `popr`. Otherwise, how can I possibly do equational reasoning and refactoring? But it seems you're considering `[foo] popr` to be a failure while `[* 5] popr` is not.

Explicit arity syntax is expensive and noisy

To make arities explicit has a very high cost that I am not willing to pay.

If I was willing to give up multiple return values, I could use sexprs and make Popr into another lisp, but this would end up being an entirely different programming language, and many of the interesting primitives would be difficult to use.

Haskell has a similar problem with currying, because something like foldr (+) [1,2,3] just gives a confusing type error. I think it's reasonable to expect the programmer to know the types and arities of the functions used.

Note that Popr currently just fails silently, similar to Prolog's no, but I intend to add heuristics to detect unexpected failures and report something more helpful, so that [1 +] popr would give a useful error message during compilation.

You can see that arities are checked during compilation by trying something like this:

:bc f: [+] pushl popr

The bug noted is intended behavior for now, although it's not ideal. You're right that [foo] popr should ideally result in [...] 5, but this isn't really useful in practice, and would add a lot of complexity to the compiler.

Re: explicit arity syntax

Given your series of "dip" functions all have explicit arity, I'm rather confused by the apparent contradictions between your first sentence claim and the libraries you developed.

In any case, that wasn't my point. The issue is that popr semantics (as I described them) requires knowledge of function implementation details - the internal arity of component functions - to be of use. As a concrete example, popr can distinguish between [* 5] versus [5 [*] dip21], which are equivalent as functions but have different implementation details.

Of course, if you insist on "bug noted is intended behavior" and that it should be sufficient to know whole function arity, then you should also ensure `[* 5] popr` fails the same as `[5 [*] dip21] popr` and `[foo] popr` (when foo = * 5) to ensure equivalence of abstractions. Basically, you should limit popr to tuples only.

Let sleeping HACKs lie

While the current implementation can't handle these corner cases correctly yet, that doesn't mean that there isn't a solution. The language works as intended for [* 5] popr, and should not be artificially limited just to hide a weakness in the current implementation.

It's just that solving this problem has little practical value, so failing on valid but contrived code is acceptable at this stage.

By the way, a simpler and more concerning example of this problem is:

[1 swap swap] popr

Although I've known about this problem for a long time, I haven't yet hit this limitation while writing any useful Popr code. When that happens, I'll have to work on a solution.

Trust me, there are still plenty of hacks in PoprC (grep HACK), or any compiler for that matter.

About dipMN, apMN, etc., this is because there are variants for different arities, because there are very good reasons to not infer arity (as some have tried.)

Semantic hacks

There is a qualitative difference between working with "whole function" arity (such as 3 input, 2 outputs) vs working with "partial application" arity (such as 1 output after 2 inputs, followed by 1 output after 1 input). The few concrete examples so far don't nearly cover all the possibilities.

You say there are reasons not to infer arity, yet we need to solve the more difficult problem of inferring partial application arity to work safely with popr.

Fixing an implementation bug, such as the issue with foo and `* 5`, won't solve semantic issues.

Alright. I just fixed it.

Alright. I just fixed the bug.

I found a simple solution; just fill any unsaturated functions with failure markers and run them anyway, instead of failing immediately.

So now these work:

: [Foo swap swap] popr
  [ Foo swap swap ] Foo
: [5 [*] dip21] popr
  [ ] 5
: [42 swap dip21] popr
  [ ] 42

Inferring arity isn't just difficult, it's wrong.

To see why, imagine a higher order function like:

$: pushl popr swap drop

apply_to_42_and_double: 2 42 swap2 $ *

which should probably apply a quote to 42 and double the result.

This works when the arity of the function parameter is 1 -> 1, but if it is 2 -> 2 instead, and $ was defined to have variable arity somehow, we could pass in something like [swap 1+ swap 3*] and modify both 2 and 42, so that apply_to_42_and_double no longer works as expected, and would break if the function is redefined as:

apply_to_42_and_double: 42 swap2 $ 2 *

So variable arity exposes the internal details of any higher order function.

Furthermore, it completely destroys the ability to operate on a graph at compile time, because function parameters could consume arbitrary inputs, so the only way to handle this is to assume that they consume all available inputs, and likewise they can produce anything, separating the graph at any point that they are applied.

All of this makes a mess of any generated code, essentially requiring an explicit stack, turning any compiled code into a glorified Forth interpreter.

certain values of 'fixed it'

a simple solution; just fill any unsaturated functions with failure markers and run them anyway, instead of failing immediately

I think you just changed one bug for another there.

?

Could you please explain?

The solution allows functions like swap that don't depend on all arguments to still work when possible, instead of just failing as before. It's hard to fully explain the solution without going into details, but it works and passes all tests.

Trickier test:

: [1 swap swap] dup head 1+ swap pushl pull2 drop
  1 2

preserving behavioral equivalence

In your presentation, you describe Popr language to be functional. Functional programming is most strongly characterized by support for equational reasoning between subprogram expressions. For example, a subprogram expression like `[Blah 3] popr == [Blah] 3`. Yet, it seems your proposed fix to `popr` does not even attempt to preserve any useful form of equivalence for the `[Blah]`.

If you cannot partially evaluate without producing an erroneous state, consider simply adding a `drop` to the original input to preserve behavioral equivalence:

[5 [*] dip21] popr == [5 [*] dip21 drop] 5

This is made a little awkward because `drop` in your language is arity 2. But that isn't a big problem.

PoprC is not a term rewriting machine

The contents displayed within a quote after evaluation are not always correct, so they are more of a hint. In my opinion, this is better than printing code that was not in the original expression, and even then, some intermediate states have no direct representation in the source language.

This may give the illusion that PoprC is a term rewriting machine, but it's not, it's a graph reduction machine, like GHC. GHC cannot print the result of (3+), for example.

not just display is incorrect

This isn't just a matter of the display being incorrect. Even if it were, you should display `[?]` like you do for some other queries I've tried. But instead it seems a matter of the state of your closure being incorrect. You fill your non-ready closures with fake arguments in order to partially evaluate them, but AFAICT the resulting closures don't work properly. In concrete terms, this is one example resulting failure:

: [* 5] popr [[3 4] dip12 ap21] dip11
  12 5
: [5 [*] dip21] popr [[3 4] dip12 ap21] dip11
bad dep 2749 -> 2774, should be 2747
bad dep 2749 -> 2774, should be 2747
bad dep 2749 -> 2774, should be 2747
BREAKPOINT !!! rt.c:690: mutate: Assertion `check_deps(r)' failed. himb
  - while compiling control.dip11 (512)
!!! rt.c:690: mutate: Assertion `check_deps(r)' failed. himb

Edit: Perhaps I've got something else wrong? I can make it work for `[5 [*] dip12] popr swap 3 swap 4 swap ap21`. But even so, it seems to me both of the above should either fail or succeed the same way!

The simplest solution I see is to just dup your non-ready closure, extract your `popr` result from one copy then addend `drop` to the other. But this isn't the best solution, since it may result in rework.

It's a work in progress

Eh. It's fixable, at least; I know generally what needs done.

Note that the arg_nd function in the compiler non-destructively adds an argument to a graph by copying on writes, so the modified closure should only be used for a single reduction, so it essentially does what you've suggested, but this didn't solve everything, apparently.

re: Inferring arity

The type for stacks and higher order functions like `dip` can be described in a conventional type system.

dip : forall s a s' . ((s * a) * (s -> s')) -> (s' * a)

If you can provide a static type for each of your language primitives, then you can infer static types for many composite programs using the normal inference methods. And although generic dip logically receives the 'whole stack' as an argument `s`, it can often be specialized and optimized by a compiler based on static type of the function parameter.

If ambiguities or inefficiencies occur, we can still leverage type annotations. Compare `(t21) dip` to `dip21`. The former allows us to separate the simple type annotation `(t21)` from the more generic `dip`, and to conveniently describe our type assumptions apart from applying functions.

In any case, I'm curious what type you'd assign to `popr`. You say that arity inference is bad, but safe use of `popr` also requires arity inference - in a strictly more general form, involving partial evaluations.

Arity inference just won't work

Type annotations are still required to resolve the problem demonstrated with apply_to_42_and_double, which means that type inference will only work if the quote arity can be fixed somehow, such as by using a quote literal.

Optional optimizations can't be relied on by a programmer, so operations on a stack still make a mess of the program's semantics.

Type inference is restricted kind of partial evaluation (PE), and PoprC is a partial evaluator, so it uses PE for type inference as well. Popr is also lazy, so it is polymorphic in any value that is not reduced. This is quite different from other methods, which are essentially strictly evaluated.

This has some interesting results, such as requiring fixed arity only at the top level, or allowing silly expressions such as this to typecheck:

: 7 42 Foo + drop
  7

Here are the type rules.

re: arity inference part nauseum

$    : ∀ s s'. (s * (s → s')) → s'
swap : ∀ s x y. ((s * x) * y) → ((s * y) * x)
42,2 : ∀ s. s → (s * N)
*    : ∀ s. ((s * N) * N) → (s * N)
(t11) : ∀ s i o. (s * (∀e. (e * i) → (e * o))) → (s * (∀e. (e * i) → (e * o)))

42 swap $ 2 *       : ∀ s s'. (s * ((s * N) → (s' * N)) → (s' * N)
(t11) 42 swap $ 2 * : ∀ s. (s * (∀e. (e * N) → (e * N))) → (s * N)

The arity of `42 swap $ 2 *` is parametric, depending on the type of the function argument. Yet it is still subject to inference. Adding the arity annotation to restrict the scope of the function argument does simplify the type and local optimizations, but is not essential for inference or program specialization. (That is, you could specialize `apply_to_42_and_double` for multiple function arities based on what's used statically within a larger program, much like generic templated code.)

In any case, static safety and some optimizations of `popr` operations depends on inference of arity under partial applications. This is a problem more general (and more challenging) than whole-function arity inference. Unless you successfully implement partial application arity inference, `popr` will require dynamic evaluators with a runtime failure option much like you were disparaging with regards to Forth.

Consequently, I feel your entire argument and position against arity inference, even if we simply assume it to be valid, is ultimately inconsistent with your language design decisions. Every time you argue against arity inference, you're effectively arguing against `popr`. You just don't seem to have understood the connection yet.

nope

Nothing ever really happens inside a quote. It's essentially a container for captured values and virtual code that gets inlined when unevaluated expressions are popped out.

PoprC is exactly as lazy with type inference as it is with evaluation. Expressions that can't be evaluated don't need types.

Essentially, after compilation, all that remains in a quote is data, because all function evaluation happens outside quotes.

problems with popr, rephrased

I feel I've still not communicated my point. The bug was perhaps too much a distraction, but ultimately irrelevant. I'll try another approach.

Assume notation `llrlrr` to serve as shorthand for the long-winded description "we must pushl twice before we can popr once, then we must pushl once before we can popr twice".

For functions of arity 3 with 3 results (call this 3-3), depending on data dependencies within the function, we can distinguish 10 potential arities for partial application: lllrrr, llrlrr, llrrlr, lrllrr, lrlrlr, lrrllr, rlllrr, rllrlr, rlrllr, rrlllr. In general, for functions of arity A-B, we can compute the number of partial-application arities as: `(A+B-1) choose A`. It's combinatorial.

Your bugfix addresses false data dependencies. For example, in `[5 [*] dip21]` the first output unnecessarily awaits the two inputs, masking `rllr` as `llrr`. But we cannot fake our way past true data dependencies! The bugfix doesn't even touch my main issues.

The existence of the `popr` operator pervasively exposes implementation details about data dependencies to all potential clients of all functions. Simply, if we apply `popr` too early, our evaluator will fail. Static typing is all about preventing runtime failures. If you intend to avoid runtime failures at `popr`, you'll need to infer partial application arities for all your functions. That includes quoted functions! (You describe type inference as "lazy" above, but you also mention static analysis in section 3.5 of your slide show.)

Explicit partial evaluation is not difficult. For example, the `llrlrr` partial application arity type can be explicitly modeled as `a → b → (c * (d → (e * f)))`. Doing so has some modularity benefits both because we've avoided exposure of implementation details and we've made our assumptions explicit and precise so we won't accidentally break things later. I grant being explicit about partial evaluation is more "noisy" than use of `popr`, but that can be mitigated with abstractions. And I'm not convinced it's more expensive, especially after systemic costs of `popr` are considered.

With regards to inferred or explicit arity, by my analysis the system with `popr` seems to be strictly at a disadvantage. Contrast:

  • without `popr`: whole-function arity inference or occasional annotations for type safety
  • with `popr`: fine-grained partial-application arity inference for type safety, pervasive explicit arity for operations like `dip` and `ap` to isolate data dependencies

Even if you disagree with my conclusions, I hope I've presented my position more clearly this time. So far, I feel we've been talking past each other.

I think I get it

So right now, the ordering of pushl and popr matters, but I can fix that as well, so that this would work:

: 2 [1+] popr [pushl] dip21
  [] 3

This works because popr doesn't actually force anything; we are just building a pipeline, and it works okay as long as there are no dependency loops.

This would mean that `llrlrr` and all the other variations are interchangeable.

I don't think it will be that hard to fix, similar to my earlier solution. One concern is that this allows loops in the graph, but this is already possible:

: [dup] popr swap pushl popr

What I mean by lazy type inference is not that it is delayed until runtime. I mean that type information is only inferred where necessary, which means that types are restricted only as much as they need to be. This ends up being a generalization of Hindley-Milner's most general type, I guess. It's just interesting how type variables and laziness are related.

Thanks for the discussion! It has been interesting and useful. I'll post when I have a fix.

popr wormholes?

Your suggested fix, or at least my impression of it, has the flavor of promises or futures, moving data through hidden channels.

I've contemplated similar features. But the difficulty I ran into is two fold. First, what happens if you dup or drop the closure before providing the pushl? Second, how can we reason locally and refactor functionally when each use of expression `[1 +] popr` has distinct observable behavior and object identity based on subsequent expressions?

I didn't find answers that satisfied my goals. Perhaps you'll not be troubled, e.g. you already have non-determinism. But I think the questions are worthy of attention.

Ordered pushl/popr is necessary

I tried to implement "wormholes", but ran into the mechanisms my smarter past self put into place to prevent them when I thought about the problem long ago.

Adding an argument modifies a closure, so the compiler first copies any part of the graph that can reach the closure where the argument will be attached.

The problem is that popr can allow dangling references (called deps) if a dep is removed from a quote, so this is the bug you saw. For example:

: [dup] popr swap 1 swap dup swap [ap21] dip32 2 swap ap11
bad dep 2246 -> 2495, should be 2245
BREAKPOINT !!! rt.c:677: mutate: Assertion `check_deps(r)' failed. pmca
  - while compiling control.dip32 (320)

Each dep forms a bidirectional link, so the compiler throws an error whenever these links are inconsistent.

So this is a limitation of Popr's semantics. Still, it's strictly more functional than treating quotes as tuples, so if it's too hard to track when it is safe to use popr, the programmer can just use quotes more conservatively by saturating them before the first popr.

On the other hand, if you make a mistake, the compiler will just tell you. Note that the "interpreter" is just the compiler compiling a specific expression to a specific result.

For example, this gives a compile time error:

: :bc f: [2 + 3] ap02 swap2 pushl 
 
___ eval.f (0 -> 2) FAIL ___

It's just a really bad error that basically says that the resulting function will always fail.

This works:

: :bc g: [2 + 3] pushl ap02
 
___ eval.g (1 -> 3) ___
[1] var, type = ?i x1
[2] __primitive.add 1 3, type = i x1
[3] val 2, type = i x1
[4] val 3, type = i x1
[5] return [ -4096 2 4 ], type = r x1

The order of pushl/popr won't affect the generated code at all anyway, as long as it doesn't violate dependencies, raising a compile time error.