Why Concatenative Programming Matters

Jon Purdy riffs on Hughe's famous "Why Functional Programming Matters" with a blog post on Why Concatenative Programming Matters.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A Few Critiques

I'm responsible for the rather awkward definition of "concatenative programming languages" that Jon claims somewhat misses the point. He's right. There's a discussion on the concatenative mailing list about how we might be able to fix it; it also includes some of my criticisms of concatenative languages. (Set your browser's encoding to to UTF-8 if you want the symbols in the FL code to not be hopelessly mangled.)

I want want to take issue with a few points in the article.

However, this does not mean that function application becomes explicit—it actually becomes unnecessary. And as it turns out, this peculiar fact makes these languages a whole lot simpler to build, use, and reason about.

As much as I love them, there's really no evidence at all that concatenative languages are easier to use or reason about. The main issue is that the equational reasoning you can do ends up being quite weak as the entire state of the program is threaded through every function. The result is that, syntactically, there is no way to determine logical dependencies between parts of a function and hence reasoning about subexpressions is quite limited. It's also somewhat easy to accidentally introduce a dependency where there shouldn't be one by smashing part of the program state you didn't mean to touch.

So already, concatenative languages give us something applicative functional languages generally can’t: we can actually return multiple values from a function, not just tuples.

This is not really true. You can view functions in a concatenative language as functions from lists to lists which are certainly possible in most functional languages. For example, we can write a concatenative-style length function in Haskell just by using unit (for the empty list) and nested tuples (for cons). First, here's the setup:

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE TypeOperators #-}

module Concatenative where

import qualified Prelude as P
import Prelude ((.), (+), flip, snd, Bool, Integer)

infixl 9 !
(!) = flip (.)

infixl 9 :*
type (:*) = (,)

-- quotation
q :: b -> a -> a :* b
q = flip (,)

-- forall A B. A [A -> B bool] [A -> C] [A -> C] -> C
ifte :: a :* (a -> (b, Bool)) :* (a -> c) :* (a -> c) -> c
ifte (((s, c), f), g) = if snd (c s) then f s else g s

-- forall A b C. A b [A -> C] -> C b
dip     = \((s, a), f) -> (f s, a)

-- forall A b. A b -> A
drop    = \(s, a) -> s

-- forall A. A integer integer -> A integer
add     = \((s, a), b) -> (s, a + b)

-- forall A b. A (list b) -> A bool
null    = \(s, xs) -> (s, P.null xs)

-- forall A b. A (list b) -> A (list b)
tail    = \(s, xs) -> (s, P.tail xs)

-- forall A. A -> A integer
zero    = \s -> (s, 0)
one     = \s -> (s, 1)

With all that mess, we can now write our tail-recursive length function; the type signature for 'length' is optional:

-- length : all A b. A (list b) -> A integer
-- length = [0] dip loop where
--   loop = [null] [drop] [[1 +] dip tail loop] ifte
length :: a :* [b] -> a :* Integer
length = q zero ! dip ! loop where
  loop = q null ! q drop ! q (q (one ! add) ! dip ! tail ! loop) ! ifte

-- xs == ((), 5)
xs = length ((), [0, 1, 2, 3, 4])

Hopefully this demonstates that there's not much "special" going on in a typed concatenative language. If you write 'q' using square brackets and make ':*' and '!' silent, you're almost all the way there. Ideally, you'd finish it with a kinding restriction such that the left side of a pair was always another pair or the unit type; unfortunately, this restriction is not expressible in Haskell so you just need to be careful to respect it.

(You will also want support for polymorphic recursion for functions that make use of things like linear recursion instead of pure iteration as above. You can probably figure out why this is so. Of course, GHC already handles it nowadays.)

They are amenable to dataflow programming

This nice property breaks down when you get into higher-order programming. This is the reason we've not seen any visualizers for languages like Joy; I'd have written one by now otherwise.

You can easily roll your own control structures

I'm not sure this is true at all; or, at least, not any more easily than you can in something like Haskell. My 'length' above is not too bad, but it gets quite a bit more gnarly when you're writing higher-order functions. As an exercise, try writing 'map'. Even if you write 'map', you need to be careful about how you write it. In particular, you need to restrict the function passed in to using no more than one element on the stack otherwise you'll really have a map/accumulator and the usual map fusion rule won't hold. This sort of restriction is possible both via types and in an untyped context.

I've been writing about this stuff to a small audience for so long now that I've probably made plenty of inappropriate assumptions. Feel free to ask for clarification where necessary.

Ease of reasoning

I'd agree that concatenative languages are not easier to reason about, if the author had meant that they're purely superior. But he didn't. They are easier in some ways, harder in others.

You correctly cite the fact that the entire state is implicitly passed to all functions. This is partially true; if the compiler doesn't care, the entire state can be accessed by any function. But this need not be so. Obviously, a strongly typed language can ensure that a function accesses only the part of the stack for which its type is suited. Less obviously, there are other data structures in addition to the stack that can be controlled in other ways. ANS Forth specifies floating point, but leaves it as an implementation issue whether there is a separate floating point stack. Suppose there were one; then none of the words which didn't use the FP stack would need access to it.

Now, an FP stack of this nature is so trivial it doesn't matter. A more sophisticated problem would be raised by words that perform something like return stack manipulation. It's possible for such words to be typechecked and to be determined compatible (ANS Forth defines a few basic types, although probably without the rigor people here would expect).

Forth also has a dictionary whose use could and should be checked (since runtime dictionary access for the purpose of creating named definitions is a very unusual action).

Obviously, a strongly typed

Obviously, a strongly typed language can ensure that a function accesses only the part of the stack for which its type is suited.

Yes, but it conceptually is passed and the syntax reflects that. That's what I mean by "syntactically, there is no way to determine logical dependencies".

Parametricity

Indeed, this would simply be an instance of parametricity. Except that (1) that is a non-trivial property and needing it severely complicates proofs of things that would be simple otherwise, and (2) nobody has actually proved parametricity properties for any of these languages AFAICT, and (3) as soon as you add reflection capabilities it's out the window.

In terms of security this is also pretty bad, because the stack essentially provides unrestricted ambient authority.

All that is true and not

All that is true and not true. The syntax "reflects" that everything is passed, because the syntax is, by definition, an exact reflection of a purely associative semantics (which does require passing everything). But real languages (i.e. anything other than zeroone) add other structures on top of the purely associative syntax (for example, the nested square brackets most concatenative languages use in imitation of Joy), so those real languages could also add syntaxes that reflect other types of access.

ColorForth isn't at all a pure language by any definition, but it provide an example of this by tagging words when they write to the dictionary, as well as when they perform immediate actions. That "tag" is actually a special character prepended to the word, and it's only the editor's implementation that hides the special character and colors the word. Obviously, a tag like this is best used with effects that occur relatively rarely, unless you add parenthesis and a distributive tag.

Manfred von Thun recently defined a language called "floy" (Flat Joy) in which he exposed a formerly internal structure of Joy called (IIRC) the "annex", where quotations get built. These annexes give the real language its tree structure, and in the real Joy language they're "typechecked" to keep quotations balanced and to make annexes appear only while the compiler is executing, never at runtime.

-Wm

Arrows

There are styles of concatenative programming that do not pass the whole program state around, that structurally constrain what each subprogram sees. Arrows seem to be an effective basis for this, though they'd still need to be accompanied by an appropriate vocabulary.

Analogy Between Concatenative Languages and Arrows

In my untutored way I've wondered about this question before. Can you elaborate (or point to an elaboration) on the relation between concatenative languages and arrows?

If you like, you can view

If you like, you can view '(>>>)' as composition and 'first' as 'dip' in terms of how you tend to use them to do concatenative programming.

Point-free style, Tacit programming

In other communities concatenative programming might be identified by the words `point-free style` or `tacit programming`. These are styles that focus on composing operations rather than the variables. Though, it is still sometimes convenient to use variables locally and desugar to a point-free model.

Arrows extend simple function composition with some additional structure and, importantly, operations on just part of that structure - thus enabling developers to enforce that certain subprograms only interact with certain parts of the structure. This addresses the above complaint about syntactically controlling relationships in the program.

Anyhow, if you shun use of the local variables to desugar back to arrows, arrow programming will be point-free, tacit, very compositional. In Haskell the composition will be type-safe - you can't just concatenate any two programs, but you can do so whenever the types match up.

Generalized arrows enforce the arrow structure far more strictly than the Hughes / Patterson arrows originally introduced in Haskell (which heavily use `arr` to lift Haskell functions). Generalized arrows are very useful if you want to embed a language that isn't a superset of Haskell.

In other communities

In other communities concatenative programming might be identified by the words `point-free style` or `tacit programming`.

Not quite. I think it's important to consider concatenative languages as a subset of pointfree languages. At a minimum, there is no form of application (as there is in J or Backus's FL). As I've said elsewhere in these comments, I think a more useful definition goes a bit beyond this however.

Application as Staging

Even for purist concatenative programming, application is possible in a manner similar to ArrowApply. Function and operand are part of the state, and you apply one to another and replace the state with the result. Something like `3 f apply` where `f` puts a function on the stack. Application, in this sense, becomes conflated with staged metaprogramming, separating the construction of the program from its execution.

So constrained use of parameters can be acceptable as a form of syntactic sugar. The relevant constraint is that the parameters can be computed statically. This is close to templated functions or macros, eliminating the need to statically write many variations of functions. The staging results in a very different programming experience than application in normal functional programming.

Of course, too much sugar would detract from the concatenative flavor. But developers will quickly find the right balance, being guided to it by the strict staging. Developers won't be able to `concatenate` programs at just any darn point, but it will still be clear to them where they can do so, and they'll still emphasize compositional reasoning with regards to the runtime behavior of the program.

Meanwhile, it won't take many statically parameterized operators - such as (***) from Arrows - to effectively and syntactically control access to different parts of the runtime state. Even if primitive, such operators will fit right in with the other uses of static parameters.

Even for purist

Even for purist concatenative programming, application is possible

Sure. The point though is that there's no syntax for it.

So constrained use of parameters can be acceptable as a form of syntactic sugar. The relevant constraint is that the parameters can be computed statically.

Yes, and Factor does this. The translation is not too bad. FL also does this with a quasi-matching syntax so you can write 〚x, y〛→ e and x and y will be translated to s1 and s2 (the list selectors) respectively in e.

But developers will quickly find the right balance

I think they already have in Factor. Variables are used (and translated away) for a very small percentage of functions.

Sure

There are styles of concatenative programming that do not pass the whole program state around

Sure, and you can do this in "normal" concatenative languages as well; I was just generalizing a bit. For example, let's suppose a combinator called 'infra' which applies a quotation to a stack on the stack:

  {A} [B] infra  ==  {A B}

We can use this to write a function 'bi*' (choice of name not mine) that, given two quotations, applies the first to all elements below the top non-quotation and the second to only the top non-quotation; the second quotation is restricted to accessing at most one argument. In other words, we have the following semantics (where parentheses denote independent expressions and lowercase variables bind single elements):

A x [F] [G] bi*  ==  (A F) (x G) 

Here's the implementation:

bi* =
  [dip] dip          -- apply the first quotation 
  [{} swap push] dip -- group the top non-quotation value into a substack
  infra              -- apply the second quotation to the substack
  concat             -- concatenate the main stack with the substack

You can also use 'infra' commonly to limit how many values a quotation passed into a higher order function may use. For example, you could imagine an 'apply_n' combinator written in terms of 'infra' that, given a quotation and a natural number representing the number of arguments it should use, applies that quotation to exactly that number of arguments and concatenates the result back onto the main stack; if the quotation attempts to use too many arguments, you'd get a stack underflow. Obviously you can also do this statically with types to push the error to compile-time.

Of course, none of this solves the problem that independence of subexpressions is not syntactically apparent. Perhaps this is not so bad if you have a good set of well-known and sufficiently general combinators that limit views of the main stack appropriately. So far as I know, no one has really experimented with this in real programs.

I am not a stack

The program state doesn't need to be a `stack`. That is how FORTH does it, of course, but with Arrows the program state is easily a (rich,((product,of),values)) + (or,(a,sum)). Some of those values could be stacks, of course, but that isn't necessary.

If you eliminate the assumption or preconception that there must be a `main stack`, that frees you up to explore design spaces where independence of sub-expressions is more syntactically apparent.

I know at least one designer who chose to shun `first f` in favor of `f *** id` in order to make the structure more syntactically apparent, especially as he was exploring the possibility of tile-based composition.

A Definitional Question

The program state doesn't need to be a `stack`.

Well this is sort of a definitional question. All languages that claim to be concatenative that I'm aware of either are stack-based or have some effectively similar scheme (e.g. Enchilada's parallel term rewriting or XY's stack + continuation model). I've personally played quite a bit with alternative models but nothing particularly successful. I remember someone presenting a rather disturbing language based on a deque instead of a stack actually...

There are certainly pointfree programming approaches that avoid the problem but I'm not sure I'd call them concatenative. For example, Backus's FP makes use of the "construction" combining form which is essentially an n-ary special syntax for '(&&&)'. I've wanted to add an n-ary special syntax for '(***)' called "combination" as well.

I wouldn't call FP concatenative though even if composition were written via juxtaposition. I think for the term concatenative to be useful, it needs to mean more than just "composition is juxtaposition"; I think it also implies something about functions accepting and propagating "additional" arguments by default. The link in my original post covers this fairly well.

Concatenative means more

Concatenative means more than `composition is juxtaposition`. It suggests a heavy emphasis on composition, to the exclusion of other programming styles. The property I'd be looking for is that operations are closed, which leaves us free for immediate composition. Operations like (***) and (&&&) or even `first` aren't very `closed` unless they operate on elements already in the program state. Though, a few standard such operators won't detract much from the concatenative feel.

An easy way to keep operations `closed` is to place operands in a stack or other state, and pass the full state from one operator to another. But is this necessary? What other options are there? Maybe staged programming, which partitions state over time. Or object capability model, which partitions state over space. Perhaps the issue is that plain old `quotation` is not enough for the control we desire.

I've only idly pursued concatenative programming. Emphasis on composition, more generally, is the real win. I believe we'll want multi-dimensional composition (e.g. parallel, choice, staging, sequential, extension), but concatenation seems to imply keeping things in a single-dimension.

According to Manfred, I

According to Manfred, I coined the term "concatenative" (I don't remember doing that); and I never intended it to be exclusive. I did coin the word "flat" with the intent of describing PURELY associative languages, i.e. languages which always map juxtaposition to composition. The difference is that in concatenative languages, you can build programs by joining valid programs; in a flat language you can also split any valid program and always get two valid programs.

So... I agree with your motives here. I'd like to see what comes of exploring your ideas.

[EDIT:] but I disagree with the specific claim that there's an innate exclusion of other possibilities.

-Wm

Useful characterisation?

Is that a useful characterisation at all? Assembly language is concatenative and flat too, then, and so is BASIC.

OK, what you're claiming, I

OK, what you're claiming, I think, is that assembly and BASIC are, in some respect, impurely associative over their lines. Let's take this as true.

But BASIC's lines are parsed; there are therefore nearly an infinite number of possible lines (I admit that I don't know the maximum number of characters per line of any BASIC, though). Therefore, calling BASIC concatenative is not a useful characterization -- it gives no analytical power, because your splitting point doesn't split apart any of the interesting parts of the language.

Assembly language is different, since in spite of a little bit of parsed syntax it does display SOME limitations on the number of possibilities per line. Machine language is a better idea, though, since it doesn't include the parsed flourishes an assembler can allow; with machine code you can consider the number of 1-byte opcodes (something less than 256) plus the number of 16-bit opcodes (obviously less than 2^16), and so on. The sum is a bit high, but it's manageable. A concatenative assembler is conceivable, even if you allow the registers to be parsed.

There are, of course, stack-based assemblers that are traditionally provided with Forth implementations; I find them pleasant to work with in comparison to the traditional assemblers. (I'm not, by the way, confusing a stack-based assembler with the question of whether the assembly language is concatenative.)

The trouble becomes that the assembly language is very _poorly_ concatenative; the associative property might be said to hold, but only when you keep in mind a huge list of rules, since so many operations act on hard-wired registers for which the correct value depends on previous operations. To TRULY model an associative assembler, you have to build a model that doesn't match the real hardware at all. So although you CAN model assembler in a concatenative manner, it's of very limited use in terms of using the strengths of either the machine or of the concatenative properties of the language.

On the other hand, you can very easily build a partially concatenative VM that comes _very_ close to the properties of the machine.

-Wm

Why bytes?

So concatenative has to apply on the level of bytes, not tokens, lines, or statements? That strikes me as a rather random and low-level definition. How are such accidental details of syntax useful as a classification principle for languages, whose main concern should be semantics?

OK, a less detailed answer to your assembly question

I answered your previous question in detail because your other comments here show that you understand computing with detail and nuance. So it feels like you're "phoning it in" with this one, or perhaps simply being hostile.

To answer your previous question without any detail: no, BASIC is not concatenative, because it has a rich applicative structure. Ditto for most assembly languages, although in some ways assembly is closer to being concatenative -- but as you understand, I can't explain that concept without getting into the low-level details of assembly language, which you've now revealed are taboo.

As for your current question: the point is that the semantics and the syntax of a concatenative language both must be associative. (Semantics are not the ONLY point of language design; the mapping between semantics and syntax is important too, as is the syntax itself.) If they're not both associative, it means you can't apply associativity to random text from the middle of the program until you carefully analyze it. Now, if your impurely concatenative language has a simple and clear way to tell when associativity doesn't apply (for example, in all real languages you can't play with letter association, you have to lex the words out first; and in Joy and other higher level language there's a nestable [syntax] that doesn't itself follow the associative rules), you can still use the associative property without too much analysis.

-Wm

What is "program text"?

I apologise if this came across as hostile. But I'm truly mystified.

You talk about program "text". But the definition of "concatenative" seems to hinge entirely on the level of syntactic abstraction or granularity that you are looking at it -- and you haven't really motivated why any given level is more relevant than any other. E.g. it seems to me that BASIC statement sequences are associative, syntactically and semantically.

Why does it matter that they have inner structure? You can always destructure syntactic "atoms" on one level into something even smaller. Until you have arrived at individual bits, I suppose. So where stop, and why there? More importantly, why does it matter, i.e. what is the "analytical power" you are referring to?

Maybe I miss something completely...

Entanglement

When we have a lexical scope, statements are often tied together by it. You cannot readily separate - change association of - the code that uses a name from the code that declares it. For BASIC, or at least the variations I've used, this would hinder it from being `flat` in the sense described initially.

I think William has muddled the issue starting with trying to grant your premise ("what you're claiming, I think, is that assembly and BASIC are, in some respect, impurely associative over their lines. Let's take this as true.")

For a program to be `usefully` concatenative or flat, I think you'd need a useful definition for `valid program`. The lack thereof limits utility for flatness of assembly or machine language.

All good points; you're

All good points; you're right.

Except one minor problem. The problem with machine language and assembly isn't that we don't know what a "valid program" is; of course we can define that. The problem is that understanding three machine-language programs doesn't always help you understand their concatenation, because they have impure side effects that aren't associative.

-Wm

Impure?

In what sense is modifying a register or a an absolute memory address less pure than modifying a relative stack slot (that, depending on context, may not even exist)?

Stack items may be

Stack items may be immutable, even though the array that usually underlies a stack is mutable (and note that Joy uses an immutable list instead of a mutable array, which it uses to backtrack to previous stack states). Therefore, a stack language is normally pure inasmuch as it's point-free -- even in a language like Forth that makes NO attempt to be pure.

With that said, though, I wasn't trying to compare languages; I used "impure" to refer to machine language instructions. I think you'd have a hard time disagreeing with that.

Old-school BASIC

To clarify, I was referring to the BASIC of the Good Old Times, with no control structure, just GOTO. Not the bastard children of today. ;)

It's refreshing to meet a

It's refreshing to meet a fellow curmudgeon. I agree with you entirely -- although old-time BASIC needed improvement, so do all languages; you don't get to steal a language's name and put it on a totally different language.

Humbug. Modern BASIC is a contradiction in terms.

Grin.

Any PL that attracted

Any PL that attracted a passionate fan base (wide use doesn't require fans) must imho have had something beautiful in its elemental design. Ideally, the next generation of language designers would identify the beauty and find a way to merge it with new ideas; instead, typically they say "that language is popular, therefore we should imitate its syntax", and so we inherit from the old languages syntax rather than beauty. Enumerating disadvantages of vintage BASIC is uninteresting; what's interesting is trying to understand what's beautiful about it.

Thank you for clarifying. My

Thank you for clarifying. My problem was when you asked about assembly but then criticized me for giving a low-level answer (to a low-level question). It seemed like I'm putting a TON of effort into answering questions, and you had put NO effort into objecting to my answers.

The problem with ignoring inner structure is that in order to analyze a program, you have to know precisely what the atoms of the program mean. If you're examining BASIC and defining the statements as atoms, you've got a language with a countably near-infinite number of possible atoms -- far more than you can usefully consider. In addition, many of the "atoms" have an effect on the state of the machine that is non-associative in its nature, so that the resulting statements may not actually have associative semantics against its neighbors (although it might!). You're actually ignoring structure that would be useful if you would consider it on its own terms -- as an impurely applicative language.

A simple example of inner structure that preserves the concatenative structure of the program that contains it is the "number" parser that Forth provides. Although there are a large number of possible atoms those parsers generate, they all produce the same program effect -- to push a specific number on the stack. The inner non-concatenative structure is simple and maintains the outer associativity, even though internally, its syntax is not associative.

On the other hand, you can almost always usefully analyze a language while ignoring outer structure -- you simply limit your analysis to the area contained by the outer structure. The "[" and "]" characters are a very common way to denote outer structure in concatenative languages ever since Joy.

Yes, you can always break down words further until you reach the bit level. For this exact reason I wrote a purely concatenative language at the bit level -- it has only two functions, zero and one. Aside from such self-imposed silliness (grin), the point is that one still has to analyze the programs written in the language by starting with the syntax of the language, mentally following the language's defined mapping from syntax to semantics in order to get a mental model of the program's semantics.

One advantage of a purely concatenative language is that, because both the syntax and the semantics are everywhere associative, you can start at any point inside the program's text, and know that the semantics you figured out are correct in the big picture. This allows you, then, to not need to figure out hard parts of the program until you've figured out the rest. This means you don't even have to _parse_ the program.

For example, in my original 1-bit language, the bit sequence "01" happens to mean "drop" (discard the top item from the stack). Thus, that sequence ALWAYS means "drop", no matter where it appears. Yes, my original language was VERY wasteful, constantly throwing data away -- but in that respect it was easy to analyze. From this fact, I was able to immediately tell that the shortest nontrivial no-op was "0010101", since I knew that "0" puts 3 items on the stack, and each "01" dropped one of them; I didn't have to KNOW what each of those 3 items actually did, since I knew that they were all discarded.

In a PRACTICAL language, unlike the toys I design, breaking down programs to the bit level, and even the character level, won't assist in understanding. All practical languages define their meanings at the lexeme level. Thus, this is where you MUST stop. Now, depending on the nature of your analysis you MAY stop earlier; but lexemes are usually defined so that their boundaries are easy to see. (I admit that BASIC statements meet this definition because they're each on a line by themselves, but I wasn't complaining about your question; I simply pointed out why it was a problem to do that, even though it's possible to do it.)

So, let's see.

"Why does it matter that they have inner structure?" Because their inner structure is giving you useful clues that you don't want to throw away.

"So where to stop, and why?" Stop at the level that helps your analysis, and describe the language based on the properties that hold at that helpful level; alternately, stop at the level appropriate to the type of language you're examining, so that your analysis can be reasonably low effort.

"What is the analytical power [I am] referring to?" Well, the ability to examine and think about a string of text (a program) and to come up with a mental model of what it does (the semantics).

-Wm

Structure

OK, I kind of see what you are saying, but I still think there is something arbitrary about it that I cannot really put my fingers on. For example, it seems that you want to be able to inspect a piece of program as a sequence of lexemes, and then be able to tell what any subsequence does without context. I said this is low-level, because in almost all languages, the abstract structure of a program, on which semantics is defined, is a tree. And at that level of abstraction, you can, analogously, look at a subtree and tell what it does. I still haven't understood what the advantage is in insisting on the low-level sequential textual representation of a program, which, to me, is just an (uninteresting) encoding of its abstract structure, an artefact of our inability to edit trees more directly.

Essentially, my understanding now is that "concatenative" says that this encoding is trivial. But why should I care about that at a higher level? And in a similar vein, why is it in any way preferable to have less structure on the higher levels of abstraction rather than more?

(Btw, just to help me understand your example: in your bit language, if you see a 0 in a random bitstream, how do you know it's the 0 instruction, and not part of a bigger instruction? And what prevents you from splitting 01 in the middle and produce something completely unrelated?)

"I still haven't understood

"I still haven't understood what the advantage is in insisting on the low-level sequential textual representation of a program" [...]

I'm not insisting on textual representation. Why would I do that? I'm telling you that textual representation is how our actual languages encode programs. Concatenative or applicative, they all encode our programs as flat text. Someday this might not be the case; but right now it IS the case.

So a language that has a theory that fits well with flat text has some advantages over one that doesn't.

Don't let me claim too much here, by the way. I freely admit that my argument doesn't prove "concatenative is better than applicative". I'm actually not interested in that argument. What I'm interested in is "what advantages does concatenative have, if any."

"an artefact of our inability to edit trees more directly."

Sure, fine; I'll grant that for the sake of argument, although I don't think trees are in any way a fundamental model of computation (so being able to edit them would merely be equivalent to a new programming language, NOT a paradigm shift). Suppose that sequential text representation IS a mere artifact of our current inability. So? Right NOW we can't edit trees; so let's use what we actually have.

"But why should I care about that at a higher level? And in a similar vein, why is it in any way preferable to have less structure on the higher levels of abstraction rather than more?"

I'm confused here. What do you mean "higher level"? And "abstraction"? In my limited experience, those terms are usually used to describe program design, not language analysis. I have literally no idea what you might be referring to here.

I'll answer your zeroone question in a separate reply.

Representational abstraction

By "higher-level" I was referring to the level of representational abstraction (character seq < token seq < parse tree < AST). As a user of a language, the AST level usually is all that matters to me on a grander scale, because that is where the semantics lives. Everything below effectively is a representational implementation detail. It may matter for editing and reading a program properly, but not for understanding its meaning. The latter is far more important than the former. And more structure helps.

Ok, I guess I see.I see

Ok, I guess I see.

I see massive problems with this response, though.

First is that your model isn't complete or accurate -- it's like the OSI's 7-layer model of networks; it's nice to have the terminology, but nobody who actually writes network code can afford to follow it. It's a minor detail that every implementation ever written disregards your model; it's much more important that many language designs depend on violating it (indentation based languages, for example). Concatenative languages are just another group of examples of languages that don't follow those rules -- eventually you need to take the huge number of exceptions as evidence that the rules aren't as universal as you thought.

Second, and even more importantly, where language design is done, we look for ways to allow people to use characters to express semantic intent; we can't just ignore the characters produced by people pressing keys on the keyboard. BECAUSE programming languages are generally textual, good programming language design MUST not ignore text.

Third, when you distinguish between "editing and reading" and "understanding its meaning" you're making an unimportant distinction. If a language design makes editing and reading easier, then it also makes understanding more likely to succeed.

So... I claim that there's some usefulness in how concatenative languages change the (character seq - token seq - parse tree - AST) and make it into (character seq - token seq - ASL). (ASL = Abstract Syntax List.) I don't think this shortcut trumps all of the difficulties real programmers have with real concatenative languages; I'm not that kind of tragic misunderstood artist :-). I just think there's something interesting to explore here, that might perhaps be useful, somehow.

You ended by saying "and more structure helps". Sure. But is function application the essence of structure? I don't see how that's possible.

-Wm

In my language there are

In my language there are only two functions defined, zero and one. If you see a zero, it's ALWAYS the zero instruction, because there are none other. (User definitions are not provided -- think of this as being more like a bitcode VM than a language. Yes, I'm working on an assembler for it that'll have most of the same characteristics but also provide conveniences like names and so on.)

If you split "01" (drop) in the middle, you'll get "0" and "1" (of course). Those are not unrelated; they mean the same thing when executed one ofter the other that "01" does when executed all at once.

To me, the creepy part is realizing that this means that if I have ANY function that ends in "0", and I concatenate it with ANY function that begins in "1", no matter what those two functions do, there's a "01" that means _drop_ right there where they meet.

Now, HOW that works does take some explaining. If you're curious, I'd be glad to explain; but first a couple of questions (assuming you ARE curious; if not, of course I won't waste your time). First, do you know combinator logic? (I think you do.) Second, have you read any part of Kerby's "A Theory of Concatenative Combinators" with any amount of understanding? How you answer those questions will allow me to tune my answer to your current understanding.

Zero and One

I do know of one language that uses 0 and 1 as the only instructions. But it also uses two cyclic registers to track the current meaning of an instruction, and external state.

Whirl.

Heh, that's actually not too

Heh, that's actually not too bad. IMO it's a better obfuscated language than Malbolge (which is not a language at all, it's just a shoddy encryption algorithm -- they could have met their apparent goals better by implanting a virtual machine in the middle of an AES algorithm, so in order to make the machine do anything useful you'd have to control the bits that appear in the middle of AES encryption).

Of course, you see the difference between Whirl and my "zeroone" -- my language has only one meaning for each symbol.

Yes, my language is almost as hard to program in as Whirl; it's so dang primitive. I built a brute-force search to find zeroone source code for a given combinator, and am building an evolutionary algorithm -- although rather than attempting to find source code for a combinator, I'm trying to find a better pair of primitives.

I'm wasting my time, but I'm having fun. You should see my other hobby projects.

ZeroOne

I am intrigued on how "push 3 items" become "drop 1" when concatenated with an unknown instruction. Do you have a blog where you can explain this?

Sadly, I don't have a blog

Sadly, I don't have a blog -- I've never been able to post continuously. If you'd like to explore, feel free to subscribe to the Concatenative mailing list; I can answer questions is far greater detail there. http://tech.groups.yahoo.com/group/concatenative/

For these purposes, I'm not sure what level of detail to go into. If you've read that web page by Kerby I posted earlier, you'll understand right away... Otherwise there's a little groundwork to lay first.

In short, the answer is that the "zero" and "one" form a complete "basis" over combinatorial logic on a stack (the usual logic operates over a parse tree). Therefore, however we DO it, there must be a way that stringing zeros and ones together forms ALL POSSIBLE combinators.

The longer answer is ... we had to figure out whether it was possible to invent a combinatorially complete basis that contained only two symbols and operated over a stack without any need for list syntax. It turns out that it is; but there are some complex restrictions that I still haven't completely figured out. I asked, we debated, and eventually Kerby turned up and was able to produce a combinator set; I then produced a different one that had some advantages (we later decided that neither set was definitely better than the other, so Kerby's definitely the inventor and the only person who could even say whether it was possible, although I did ask the question first).

I'll let you digest what I've said so far.

Minor OT: Polymorphic Recursion in Haskell

Polymorphic recursion has been required of Haskell implementations since Haskell 1.3 released in 1996.

.

.

Concatenative Dataflow Programming with Arrows

I've been developing a concatenative programming language based on an arrowized reactive/dataflow programming model. I had completely forgotten this discussion from eighteen months ago, so I was a bit surprised to realize I had contributed to it. One of my comments - about arrows allowing a richer environment than a stack - seems very relevant now.

John Nowak argues that concatenative programming being "amenable to dataflow programming" breaks down for higher-order programming. I agree with this for existing languages.

Fortunately, the arrowized model helps out a great deal by enforcing a clean staged programming model. For normal arrows, the inputs cannot be observed before the arrow is fully constructed, which results in a very rigid and inflexible program structure - but one very useful for dataflows and reactive programming (as demonstrated in a multitude of FRP libraries).

To gain some flexibility, I support 'static' data types along with static decisions. In Haskell terms, a 'static' number would be a number represented in the type system, and static decisions could be modeled by dispatching on typeclasses. In my language, all literals (text, numbers, and [blocks of code]) are of this static-typed nature. (But I don't support any form of quoting.)

The concatenative program is ultimately a compile-time metaprogram for a run-time arrow structure. A subprogram like `7 roll` might roll the 7th element of the stack to the top. More interestingly, `"foo" load` could automatically build the arrowized data-plumbing to load a value from an association list in the complex environment.

I'll provide a little example code. My language doesn't actually support type signatures (inference and introspective assertions only) so I'll just comment a few in.

% first :: ((a->a')*(a*b)) -> (a'*b)  -- PRIM
% assocl :: (a*(b*c))->((a*b)*c) -- PRIM
% swap :: (a*b)->(b*a) -- PRIM
% rot3 :: (a*(b*(c*d)))->(c*(a*(b*d))) -- PRIM
% intro1 :: a -> (Unit*a) -- PRIM
% elim1 :: (Unit*a) -> a -- PRIM

% assocr :: ((a*b)*c) -> (a*(b*c))
assocr = swap assocl swap assocl swap

% rot2 :: (a*(b*c)) -> (b*(a*c))
rot2 = intro1 rot3 intro1 rot3 elim1 elim1

% rot4 :: (a*(b*(c*(d*e)))) -> (d*(a*(b*(c*e))))
rot4 = assocl rot3 rot2 assocr rot3

% second :: ((y->y')*(x*y))->(x*y')
second = assocl swap rot2 first swap

% apply :: ((x->x')*x)->x'
apply = intro1 swap assocr first

I could provide a bunch of dup/over/pick instructions. But they're a bit more involved in my language since I support substructural types, and I don't want to get into those differences right here.

Also, I introduced rot3 to support block-free data plumbing. Without rot3 the same functions would use blocks, which works fine if our environment is a simple stack and a single stack and blocks:

second = swap [swap] first swap first swap
rot2 = assocl [swap] first assocr
rot3 = [rot2] second rot2

The blocks of code serve a similar role as they do in Factor. Developers are more limited in their ability to 'capture' runtime-computed values into a block.

One issue I quickly discovered is that: a single-stack environment is awful for concurrent dataflow programming. In concurrent dataflows, I often have multiple workflows or pipelines that are mostly independent, but occasionally interact (sharing intermediate results), merge together, or divide into new subtasks. A single stack works wonderfully for a single pipeline. But the moment I try to integrate intermediate results, it fails: I need to dig too deeply into the stack to find values from the other workflows computation, and this digging is fragile to upstream code changes.

So I tried a few other environments. Ah! I thought. A Huet zipper based environment looks expressive! So I first tried that. And it was very expressive. I could create subtrees to my right or left, wander into them, create values and objects. I came up with a very powerful idea: of modeling a "hand" to carry things as I move around. So I could model different tasks or workflows in different subtrees, and carry intermediate results from one tree to another.

The problem was, a tree-zipper was too expressive. It was too sparse; never clear how to use the space effectively. I felt lost in a maze of twisty little passages, all alike. When I felt the hungry eyes of a grue waiting for that torch I was bearing (in my newly modeled hand) to fizzle, I bravely turned my tail and fled.

So I tried a couple more environments. First, a list-zipper of stacks. This worked pretty well, except that I could never remember the relative position of a task. The normal pattern I fell into is not jack-be-nimble jumping between tasks, but rather to extend one task as far as possible, then go to another task to build it up enough to get some intermediate results.

My current environment looks like this:

% ENV (stack * (hand * (stackName * listOfNamedStacks)))
% apply :: ((f*s)*e) -> se'
apply = assocr intro1 swap assocr first swap elim1

% take :: ((x*s)*(h*e)) -> (s*((x*h)*e))
% put  :: (s*((x*h)*e)) -> ((x*s)*(h*e))
take = assocr rot3 rot2 assocl rot2
put = rot2 assocr rot3 rot2 assocl

One point is that literals (numbers, text, blocks) are added to the current stack, to avoid burying the extended environment. The 'rot3' primitive enables this to work, by ensuring I can always do basic data plumbing without adding more blocks to the stack.

I must use static metaprogramming to access those named stacks, and I haven't implemented the functions yet. But the API I envision is: `"foo" load` and `"foo" store` to operate on remote stacks, and `"foo" goto` to navigate to a stack by name (which will store the current stack away under its current stackName).

The load/store operations enable named stacks to serve as registers or variables, useful for storing objects that would be painful to explicitly thread through the application (singletons, context objects, or configuration info, for example). Their stack-like nature can enable use similar to lexically scoped variables, assuming disciplined developers.

Anyhow, I've come to a few conclusions:

  1. The static nature of the arrow type is very promising for visualization. Continuous visualization of the data type during development should enable developers to operate at larger scales.
  2. Multiple stacks for multiple tasks! It's essential. Workflow programming, task concurrency, multiple pipelines - it's all bad on a single stack.
  3. The concept of a hand is awesome. It's useful for carrying things between stacks. It's also useful for operating on a single stack - i.e. it's often convenient to pick up a few items and operate deep on the stack, then drop those items. Most importantly, the hand - more generally, keeping a programmer model in the program - can potentially support deep integration of GUI with visual programming; I might try it for augmented reality.
  4. At one point, I tried to model a "literals brush" - a block in the environment that controls where literals are added to the environment. This was a mistake. It resulted in every subprogram that uses literals having a highly modal meaning.
  5. Imperative metaprogramming of a declarative, arrowized model has a lot of nice properties. Developers get a lot of expressive power with predictable performance. Further, the equational properties of the target model simplify reasoning and refactoring in the imperative code (e.g. we can know that the order we add operations on different stacks doesn't matter). Basically, we get the best of both worlds. (Whereas declarative metaprogramming of imperative programs seems to get the worst of both.)
  6. It is not difficult to reason about application to a subset of the environment. Arrowized models do this normally with `first`. Syntactically, we get the benefit if we have a common and well known library of applicators (words that apply a block) that operate on well known subsets of the environment - e.g. the top one to few elements of the current stack, and perhaps also an explicitly provided list of named stacks.
  7. Zipper environments are not so good. However, zippers are still promising for manipulating a document-like object that is sitting on the current stack. E.g. we could model an HTML-like document on the stack, sift through it, add and remove structures, or integrate between documents. Diagrams, higher-order programs, simple databases, DSLs, etc. can all be modeled as document-like objects.

I've been taking some flak for choosing a concatenative language. But I don't regret it, especially since it resulted in the powerful 'hands' concept and programming environments both simpler and more expressive than I would have dreamed up in an applicative language.

a single-stack environment

a single-stack environment is awful for concurrent dataflow programming

Yes. I have lately been considering how to implement concurrency in my language, and came to essentially the same conclusion as you. Static typing enables static program structure, which solves the problem by showing you which things are separable. If you have two functions of type (r a b cr d), then you can copy the top three elements of the stack, run those two operations concurrently, join on both, and return two values of type d.

Doing this efficiently of course means multiplexing those concurrent operations onto the available parallelism, which will be an interesting challenge.

I need to dig too deeply into the stack to find values from the other workflows computation, and this digging is fragile to upstream code changes.

My solution to this was having an auxiliary stack, for which named local variables are sugar. Simply, if you can move computations off the data stack thread-locally, then you can reason thread-locally about data stack effects. I hate stack-shuffling and find no justification for it over named locals. When combinators are natural, they should be preferred—but the effort required to correctly factor most code is better spent elsewhere.

Their stack-like nature can enable use similar to lexically scoped variables, assuming disciplined developers.

Never assume that.

The static nature of the arrow type is very promising for visualization.

Really looking forward to seeing what you come up with in this area. My only work has been on making the kinds of stack effect diagrams seen in my article, which seem suitable for low-level (imperative) and high-level (dataflow) operations, but suffer for mid-level code that mixes styles.

I've been taking some flak for choosing a concatenative language. But I don't regret it

Ditto. :)

named locals vs. stack shuffling

I hate stack-shuffling and find no justification for it over named locals.

I'm hearing that 'hate' word a lot recently, regarding this issue.

I think stack shuffling is rarely difficult once you move beyond the newbie stage. A relevant observation:

The more complex shuffles, such as rot and roll, are only used a few dozen times in the entire code base (consisting of tens of thousands of lines), so indeed they are more of a crutch for beginners than a real tool that serious concatenative language programmers advocate using. Good code has simple data flow and does not need stack gymnastics. Factor's optimizing compiler, web framework and UI toolkit do not have any complex shufflers at all, and together they comprise some 30,000 lines of code.

But people new to concatenative stack-based languages - to them, it's dup, dip, bloop, dribble, rot... just arcane nonsense and gibberish, no better than Perl line noise to someone new to Perl. I can understand why they might hate it. They can't visualize what happens to the stack, and really shouldn't have to.

My hypothesis is that your named local variables help these newbies visualize the stack in a more concrete manner, and they help manipulate objects on it. And, if my hypothesis is correct, named locals should be replacable by any other system that serves the same role, such as an IDE with good visualization features and good support for interactively automating data-plumbing code.

A problem with names is that names can be syntactically used in ways that their referents cannot safely be manipulated. By analogy, I can mail you the word 'gorilla', but I cannot mail you the gorilla. Names can too easily express behaviors that are inefficient or insane - such as transporting large values between CPU and GPU, sharing state between threads, naming deleted objects, fulfilling a promise with itself, or unintuitively keeping whole environments around just to support closures. Names seem especially problematic for substructural types - syntactically, it is too easy to drop a name unused, or use a name twice, when the type does not allow it.

There are other problems with names, such as nominative types binding at too large a granularity, or negotiating the meaning of a name with remote systems that might use different library versions.

Names are useful. But I think names are most useful for the non-locals: singletons, configuration parameters, 'special' variables or context objects, and concurrent workflows. And even in these cases, I think they should be modeled carefully, in a way that does not confuse programmers into thinking the name is equivalent to the referent (unless it really is equivalent to the referent in all relevant scenarios).

I heard two people having a violent agreement

My hypothesis is that your named local variables help these newbies visualize the stack in a more concrete manner, and they help manipulate objects on it. And, if my hypothesis is correct, named locals should be replacable by any other system that serves the same role, such as an IDE with good visualization features and good support for interactively automating data-plumbing code.

Named locals are just one representation of the data flow DAG which I find convenient for teaching and implementation purposes.

Names can too easily express behaviors that are inefficient or insane - such as transporting large values between CPU and GPU, sharing state between threads, naming deleted objects, fulfilling a promise with itself, or unintuitively keeping whole environments around just to support closures.

All of these things need to be addressed, and some have been. Kitten has no mutation, is memory-safe, uses eager evaluation, and has cheap closures (only copying values and sharing references that are actually used). The transport of values between CPU and GPU is a big concern, as I would like AMP to feature prominently in (distant) future versions of the language. You want tight control over what happens when, and for inefficient code to look inefficient.

syntactically, it is too easy to drop a name unused, or use a name twice, when the type does not allow it.

That’s what good error messages are for. Names can actually help here—“x was copied but it is noncopyable” rather than “could not match type [a] with type ~[a]”.

names are most useful for the non-locals

Absolutely. A function, config value, or concurrency primitive models a behaviour, while a local models state; names are, as far as I can tell, much better for the former than the latter.

That’s what good error

That’s what good error messages are for. Names can actually help here—“x was copied but it is noncopyable” rather than “could not match type [a] with type ~[a]”.

Names were the cause of the problem in the first place. But I suppose it's nice that you'll at least have an easy way to name the problem. ;)

Anyhow, a better error message would say where the problem occurred, and maybe render what the type looks like in context, and highlight the problem type in red. Even better would be to integrate with the IDE.

You want tight control over what happens when, and for inefficient code to look inefficient.

I do indeed.

Names though?

Names were the cause of the problem in the first place.

How so? You can’t dup something noncopyable, whether you name it or not. And if you implement the same behaviour as dup using locals, then you at least have a source location to point to where the error is:

def myDup:
  ->x  // Okay, moves x.
  x    // Not okay; copies x.
  x    // Seriously not okay at this point.

Similar arguments apply to values with linear/affine types; I’m just being lazy about coming up with examples.

As I see it, the explicit

As I see it, the explicit use of 'copy' - and the expansion (code path) leading to it, including dup - would make it really easy to say "look, you say copy on this no-copy item. It's OBVIOUSLY wrong.

With names, multiple uses aren't nearly as obvious. There are safe usage patterns (e.g. linear sequence of method calls where each returns a linear object) and unsafe uses. Since names are for convenience (the same argument is repeated), it is not unexpected for PL designers to enable as many safe uses as possible. People make mistakes because it isn't syntactically obvious what is safe or unsafe.

Dropping names is even less obvious.

Visualisation helps

Yes, this is where program text is inadequate. As I mentioned earlier, the names are just a textual shorthand for expressing the dataflow DAG, which is better rendered as an actual DAG:

    +---[ myDup ]---+
    |               | ~a
    |        +--[ x ]---->
 ~a |       /       |
----[ x ]--<        |
    |       \       | ~a
    |        +--[ x ]---->
    |               |
    +---------------+

(Pardon my ASCII.) Here the error is much more apparent, and the UI can indicate the exact point in the diagram where a unique reference got copied when it wasn’t supposed to be.

I agree, visualization helps

I agree, visualization helps find the error. Now if only we could have avoided it. Perhaps by not using a reference that can casually be copied or dropped when the referent cannot. :)

A rose in any stack position

But people new to concatenative stack-based languages - to them, it's dup, dip, bloop, dribble, rot... just arcane nonsense and gibberish, no better than Perl line noise to someone new to Perl.

I don't think the problem is unfamiliarity. Stack swizzling is just a terrible UI. I became fairly proficient at programming directly in assembler (many) years ago, but I wouldn't ever want to go back to that style of coding. There is a mental tax involved.

A programming language is low level when its programs require attention to the irrelevant.

I obviously don't have any experience programming in either of your (Dave's or Jon's) languages, but I will still stake the position that names are a better UI than stack swizzling, especially in the presence of visual feedback of the sort you're proposing. Consider switching from using a local stack parameter to a global configuration parameter -- can you do that by changing a name or must you 'dip trip flip fantasia'? Also, sometimes you might want to make meta-level assertions about signals in ways that don't make sense computationally (i.e. the buffer on the GPU matches the buffer on the CPU or if I were to mail this gorilla...). Names can let you talk about these things and in the presence of a visualization of the inferred stack manipulations, it should be clear when you're doing things with names that don't have a well behaved computational correspondence.

Locals and Globals, Names and Nonsense

Consider switching from using a local stack parameter to a global configuration parameter -- can you do that by changing a name

Yes. Concretely, a programmer might express that as `"use_opencv" store` in my language, to move the top item of the current stack (ignoring the text we just added) to the top of a (potentially new) named stack identified by the text "use_opencv". A named stack can be used as a global variable for configuration data.

The environment is modeled with the product `(stack * (hand * (stackName * listOfNamedStacks))))`. The actual motion from the top of the stack to the correct location in the listOfNamedStacks does require some rich manipulations of this environment structure. But. My language supports metaprogramming by use of introspection. (Not really different than use of typeclasses for typeful dispatch; "use_opencv" is formally a unique unit type, not a runtime value. I don't have dependent types, though they're on my wishlist.) So, when dealing with global names, all that 'drip trip flip fantasia' is automated.

Global names I don't have a problem with. Names are convenient when used to speak of things that are far away - i.e. that we cannot easily pick up or point at. Even so, I use the quoted name "use_opencv" to enforce a very clear distinction between the reference and the referent.

But speaking of globals is misleading. Those are a relatively small part of our programming experience. To argue we shouldn't use stack swizzling because they're painful for globals is analogous to saying we shouldn't use a mouse because it's painful to copy characters out of a charmap. "Doctor, it hurts when I do this!" "Then stop!"

My position is that stack shuffling is simple and works great for the bulk of the programming experience, where there are only a few arguments or intermediates in play at a time, they have stable call positions, and they've already been gathered by the caller. This accounts for basic, reusable functions and procedures. While stack shuffling might still not be fun, it is often better than the syntactic and semantic overheads of declaring and managing names.

Visualization helps with the 'mental tax'... I imagine it helps almost as much as writing with a text area instead of a line editor. Even visualization of globals can be useful to provide context. Of course, we could also support drag and drop gestures with globals that write code like `"foo" load` or `"foo" store` or `"foo" loadCopy` or `"foo" recursiveDropStack`

If developers want to automate some of the local data plumbing by drawing a few wires, I can support this too (though I'd refuse to make it opaque to the user; I'd like a little understanding to occur by osmosis if nothing else).

I even could model local names with metaprogramming. Similarly, I could model brainfuck with metaprogramming. It isn't hard to build a Turing tarpit for the people who really insist on working within one.

sometimes you might want to make meta-level assertions about signals in ways that don't make sense computationally (i.e. the buffer on the GPU matches the buffer on the CPU or if I were to mail this gorilla...).

We can assert "the gorilla in my left hand must, at runtime, weigh more than the banana in my right hand" without naming either of them. We just need a language to make such assertions - i.e. pairwise declarations relate the top two elements of the stack.

Names would be useful if we wish to grab and relate global artifacts, but that's just the same story as before. In practice, I think the common case would be local assertions, since that leads to a more constructive and compositional analysis. E.g. in your CPU-to-GPU buffer example, you would generally make the assertion near the site of transfer, at which point both buffers would be present.

Names might seem useful for 'dead' objects, e.g. when comparing preconditions to postconditions on a linear object, or in terms of your example - perhaps an equivalence after buffer transfer.

Fortunately, there is a good name-free solution for this, too. Names aren't the only way to describe an object. We can support a primitive that creates a 'memory' of an object's type at a given point - i.e. a kind of staged reflection:

memoryOfALonelyType :: x ~> (Ty x * x)

This operation can be safe even for linear types. As a rather nice consequence, this nameless memory of a type as a first class static object should support first-class refactoring and decomposition of the type analysis and declarations code. Further, you never need to say things that aren't sensibly behaved; you have a sane modal distinction between an object and a description of an object.

in the presence of a visualization of the inferred stack manipulations, it should be clear

You seem to be assuming that "the presence of a visualization" is orthogonal to the use of a concatenative programming model where the full environment is an explicit, well-typed object to be passed from function to function.

I think such an environment may be much more difficult to achieve in an applicative language, or one that heavily uses names. They don't make the environment very explicit in the programming model.

De Bruijn vs named

I agree. Even in formal calculi, most people highly prefer working with named binders over De Bruijn notation. Although the latter is simpler to deal with on a puristic technical level (no trouble with alpha equivalence etc), it is rather impractical and error-prone for most human minds. Just try following a proof computing with a De Bruijn indexed lambda calculus -- it requires 10 times as much concentration, care, and time.

Names are an essential abstraction to reduce cognitive load. Same in programming languages.

The fascination with stack juggling languages strikes me as an instance of falling victim to the joy of "sense of achievement".

De Bruijn vs. Named - false dichotomy

De Bruijn vs. Named is a false dichotomy. I'm certainly NOT suggesting we use numbers-as-names instead of symbols-as-names. That'd be like choosing the very worst of both worlds.

Names aren't essential. You're just using them as a poor man's visualization tool. If I needed to follow proofs presented using De Bruijn indexes - or at least more than a few small lemmas - I would be tempted to write a program to help me visualize what is happening - i.e. implement various test cases, perhaps animate what is happening, probe intermediate states.

I agree that stack programming has a cognitive load. Stack programming without automated visualization seems analogous to using a line editor instead of a text editor. Which is to say, it isn't all that difficult once you have a little experience with it, but it could be much easier. In the absence of visualization, the cognitive load for writing concatenative code seems to be lower than for reading it, perhaps because the writer knows how each word she uses will impact its environment but this is not syntactically obvious to the reader. (OTOH, effectful code, or code with callbacks or continuations, has the same problems even in applicative code with abundant use of names.)

But, perhaps as a consequence of this, there is no "sense of achievement" for merely writing the code - not beyond the normal "getting started" or "mission accomplished" achievements we feel in any language. A "sense of achievement" for just getting stack shuffling right would wear out relatively quickly.

But you would be hard pressed to find people more satisfied with programming than in the concatenative languages communities. What concatenative programmers feel is perhaps not 'achievement' at making things work at all, but satisfaction of doing so concisely, with easy refactoring and simple semantics (occasionally augmented with powerful metaprogramming). Or maybe it is because the language is more easily adapted to the user; conventions are very flexible.

Easier to refactor?

De Bruijn is a pretty exact analogy AFAICS. Except that you cannot even refer to the i-th stack element directly in these languages, but have to do so indirectly by extra commands that shuffle it to the top first. Frankly, from where I stand, that is even worse.

I question that stack programs are easier to refactor -- exactly because they lack the abstraction of naming, and so induce stronger coupling in the correct bookkeeping of irrelevant details. ...Unless you are talking about low-level syntactic aspects of cutting & pasting code segments, which are not terribly important if you ask me.

De Bruijn is a straw-man analogy

You seem to be focusing on one small aspect: how to reference values.

Things you aren't considering include: multiple return values, multiple parameters, how words in stack-based languages can continuously clean up the stack, or consequently how tail-call recursion can be expressed directly.

you cannot even refer to the i-th stack element directly in these languages

Sure you can. Unless you're nit-picking on the word 'refer'. We can certainly access the i-th stack element.

Trivially, if you can describe stack-shufflers to extract the i-th element to the top of the stack, you could name and reuse these shufflers. So you could create a series like "roll2..roll9" to move the i-th element to the top, or "pick1..pick9" to copy the i-th element to the top. If you see a pattern (and you will), and if your type-system allows it, you could even abstract these words so that the 'i' value is an argument - and doing so would get you the FORTH-like pick/roll commands.

In practice the need to refer to more than the third element is rare, especially in well-factored code, so you don't see developers using these much. But they're available if you want them.

I question that stack programs are easier to refactor [..] Unless you are talking about low-level syntactic aspects of cutting & pasting code segments

The ability to find common code segments and factor them without disentangling local names is useful for refactoring (and certainly lowers the barriers, so it happens more continuously), but there is more to it than that.

Concatenative code seems more effective than applicative code when refactoring the internal structure of functions - i.e. dealing with all those temporary variables, or those cases where the same argument gets passed to a bunch of different functions. It also keeps the environment cleaner while doing so, rather than building up a bunch of irrelevant temporary variables or explicit re-assignments.

There is more to well-factored concatenative code than cut-and-paste. The words should have meaningful connotations to humans. And we want the input elements on the stack to all be relevant. And we also want the inputs arranged to minimize shuffling in the common call case (if there is more than one common call-case, we'll tend to define a few words). Same with the outputs, if any: they should be arranged so they can be immediately used in the common case. It is because of this factoring that complex stack shufflers are rarely needed.

stack programs [..] lack the abstraction of naming, and so induce stronger coupling in the correct bookkeeping of irrelevant details

I suppose by 'irrelevant details' you mean 'locations on the stack'. True enough.

But using names just trades one set of irrelevant details for another. With names, I need to deal with all these symbols and spellings that have no impact on meaning of the computation. And the bookkeeping for these irrelevant names is often horrendous: every relevant name must be spelled out at least twice: once as an LHS, once as an RHS, resulting in all sorts of verbosity. Symbols often have multiple human meanings so we often encounter accidental shadowing (the word 'map' is especially bad). I need to concern myself with lexical vs. dynamic scope (or worse). I need to concern myself with alpha equivalence, and potential name capture in macros. I need to deal with this funky semantic distinction between reference and referent. And, oh, then I need to some handwaving to explain tail calls or named return value optimizations.

As bad as you imagine bookkeeping of irrelevant details for stack languages, names are worse. You just stopped thinking about it, due to your experience. The same happens to a concatenative programmer - minor stack manipulations fade into the background. But since there is less bookkeeping to perform or reason about in stack languages, the remaining language is simpler and more concise.

Stack languages are worse

But using names just trades one set of irrelevant details for another.

When you write a function with names, first you write down the parameter names. It's a natural step in the design process, makes good documentation, and you'll probably need some equivalent commenting in your stack language program anyway. Once you've done that, the cognitive load of trying to get at the names where you need them is very low. You might have to glance up to remember what you called the parameter, but that's not as difficult as trying to keep track of where values have been shuffled on the stack by all of the other words you've written thus far.

Cognitive load is easily addressed.

The total cognitive load for names is higher than than "trying to get at the names where you need them".

With tacit programming, I quite appreciate not having to establish good names. I apparently avoid some decision fatigue by avoiding such irrelevant decisions. When I comment what a word does, it is unusual for me to name all parameters, or even to use a consistent name. And there are indirect effects. Using names for parameters leads to verbose expressions. Local names get entangled with expressions, which hinders refactoring.

Also, you seem to be assuming a single-stack language. I can say for certain that modeling a second stack for the 'hand' (with take/put/juggle ops) makes it a great deal easier to prevent items from getting buried or lost. I can just pick something up if I'll need it downstream, or if I need it out of the way. The hand is intuitive; it doubles the number of items I can mentally track and manage, and eliminates most need for careful factorings.

But I believe you raise a valid point.

Programmers in concatenative languages must concern themselves with "keeping track of where values have been shuffled on the stack" and "getting at them where you need them".

In practice, developers take steps to make this easier on themselves: factor difficult functions into small pieces, keep the stacks clean, use simple calling conventions. One could say they are trading the cognitive load of "keeping track" for the cognitive load of developing a simple factoring or API, even in cases where code won't be reused.

Programmers using names can be a lot sloppier. They can write functions fifty lines or longer, create a huge mess on the stack, and simply ignore it. Which isn't a bad thing.

Unfortunately for your position that "stack languages are worse", the cognitive load you complain about is easily addressed with automatic visualization. Similarly, if developers don't want to think about 'getting at an item' I can provide a little automatic code generation. I can do this without accepting the verbosity, irrelevance, decision fatigue, and multifarious semantic issues that names introduce.

You might wonder: "but why is this important? why rely on an IDE? why not just use names so the problem is handled in the language?"

Well, I consider PL to be a user-interface, and vice versa. We are -through text, icons, forms, gestures - influencing the behavior of a computer system. Unfortunately, most PLs are poorly designed as user-interfaces; for example, they require us to use names like a psycho with a label-maker, instead of just using a few gestures to establish our meanings. Conversely, I consider many user interface designs to be awful PLs - lacking composition, abstraction, expressiveness, reusability.

I would like to address this stupid, short-sighted design gap - one that I expect has cost us more than a few billions. I'm not the first to try, but I have a new perspective on an old idea - simple, powerful, and extremely promising.

I recently realized the possibility of representing a user-model within a program - i.e. a programmer-model. It started with the recent idea of 'navigating' a zipper-based or multi-stack environment (in order to program concurrent behaviors in Awelon), and was soon augmented: use a 'hand' (with take/put/juggle ops) to carry objects (including runtime signals or wires) as I navigate.

But a programmer model is nowhere near the limit of that idea. We're modeling an environment, and we're modeling a programmer. The next step should be obvious: why not model the integrated development environment. The entire user-interface (menus, workspaces, open pages, etc.) can feasibly be modeled in the type-system of an arrow. Every button-click, gesture, menu selection, text area and character typed can be formally modeled as a concatenative extension to the 'program' that describes this IDE. Even multi-user environments are feasible, perhaps using a DVCS-like system.

Such a system could support continuous extension (and reduction, where I want it) and metaprogramming, would be robust to some history rewriting (undo, macros, optimization), could be efficient through snapshots and replay. The 'user interfaces' could contain live, time-varying signals in addition to ad-hoc statics, and would be easily composed. Security could be modeled within the language.

Anyhow. Summary:

Names are okay for some purposes - large-volume configuration parameters, singletons, wiki pages, reusable modules or software components. But such names should be used carefully - reference distinguished from referent, explicit lookups - so that people don't gain bad intuitions about them.

I believe named locals are interfering more than you know, and helping you less than you imagine. The cognitive load for tacit, concatenative programming is easily addressed and then some. I think the corresponding issues for names - being more semantic and implicit - cannot be addressed as easily.

Unconvinced

I'll agree that names are less important the more local you get, but short functions can have parameter names like x, xs, n or f. That's usually how it's done in Haskell and I doubt anyone gets decision fatigue when picking a letter for a parameter. So for locals, a single letter name like x is easier to type than a stack manipulator like... bloop ;). And as you move to larger scopes, referencing things over a larger distance, good names become more important, but it sounds like you agree with that.

I'm all for designing a programming language to be a good UI, but my thinking has me doubling down on names.

I don't really get the idea of IDE interaction as a program. I get that you could do that, but not why it would be interesting. And wouldn't this proposal restrict the programmer to edits that leave the program in a well-typed state at every intermediate step along the way? That seems like it would be very frustrating. I'm not sure that I understand this idea, honestly.

Short name vs. long name is

Short name vs. long name is pretty much irrelevant regarding any of my concerns. Many cognitive and verbosity benefits of short reusable names are undermined by accidental shadowing. And the real semantic issues aren't addressed at all.

Also, pointing at 'x' vs. 'bloop' in terms of character count isn't convincing to me. One should certainly count the separate declaration, and the extra line typically needed to make it (`let x = in`, or `var x = ;`, `foo = function(x) {`, etc.). It is also my impression that, in stack programming, objects on the stack are often where they can be used without manipulation.

I wonder what the average ratio is between parameters/local names in applicative code vs. explicit stack manipulations in concatenative code. My guess is that this ratio is between two and four, but this is something that could be tested against an actual codebase with similar libraries.

I'm all for designing a programming language to be a good UI, but my thinking has me doubling down on names.

I do not know of any 'good UIs' that require humans to declare and use even a tenth as many names as modern programming. But let me know if you find an exception.

the idea of IDE interaction as a program [..] why it would be interesting.

It is interesting because it enables well-defined but ad-hoc mashups, transclusion, views, transforms, and extensions, metaprogramming and reprogrammable UI, undo that can be precise or pervasive. It is interesting because it can unify programming environments and user-interfaces. Applications become extensions. Programming can be a flexible, ad-hoc mix of live operations or manipulation, and constructing behaviors to apply later. Security can be modeled formally and pervasively.

wouldn't this proposal restrict the programmer to edits that leave the program in a well-typed state at every intermediate step along the way?

The programmer would be restricted to manipulations that leave the programming environment in a well-typed state at every intermediate step. So this would include not installing extensions or running transforms for which the environment isn't well typed.

If you wanted to represent an ill-typed program, you'd do it much like you might in Haskell: use an AST, or text.

IDEs

The potential benefits of short reusable names are undermined by accidental shadowing

Speaking of problems that are easily remedied with IDE feedback, this is one.

It is interesting because it enables well-defined but ad-hoc mashups, transclusion, views, transforms, and extensions, metaprogramming and reprogramming, undo that can be precise or pervasive. It is interesting because it can unify programming environments and user-interfaces. Applications become extensions. Programming can be a flexible, ad-hoc mix of live operations or manipulation, and constructing behaviors to apply later. Security can be modeled formally and pervasively.

The IDE I'm building supports most (all?) of these features under a simple framework, but I apparently don't share your ontological view. Most specifically, I don't see anything useful about viewing arbitrary input to a UI as a program. The only thing useful you can do with arbitrary UI input is undo/redo it (possibly cloning the state of the application between). Languages typically have richer structure that makes it useful to consider them as languages.

If you wanted to represent an ill-typed program, you'd do it much like you might in Haskell: use an AST, or text.

OK, I assumed you had in mind more alignment between the environment and the program, like in a staged programming environment.

OK, I do see the appeal

Mapping keys / gestures onto language elements would be a good way to allow custom configuration, and there is still value in using languages when the lowest common denominator might just support undo/redo.

I was already planning to have certain UIs exposed through language (I think we've discussed that in the past), but I'll have to think more about exposing every UI through its language. Maybe that is a good idea, after all.

And I can see how that might map more directly to a concatenative language. I'll think more on that, too.

IDEs with Direct Semantic Access

The IDE I'm building supports most (all?) of these features

Oh really? I wonder if you're interpreting my words differently, or in a far more constrained (or crippled) manner, than I mean them.

Let's try a scenario: copy and paste of an observed time-varying signal, along with a pair of control capabilities associated with buttons.

Let's assume the capabilities can influence lighting in your house, and the signal describes the current state of the lights. You want to integrate this with a software agent that will record how long your lights are on, and turn them off after a configurable time if a motion sensor signal hasn't reported activity.

Can your IDE do the copy and paste? If so, does it show up as a mess of context-sensitive code to get the signal to the agent? Or is it just a formal 'programmer hand' modeled in the development environment, with precise data plumbing semantics?

I don't see anything useful about viewing arbitrary input to a UI as a program. The only thing useful you can do with arbitrary UI input is undo/redo it

Nonetheless, you'll execute a subprogram for every gesture, mouse move, and manipulation. It's just, if you follow normal conventions, your subprograms will affect a separate view-state of your program (your 'IDE'). This isn't so much an issue for pure views, but the moment you start manipulating the program in rich, ad-hoc ways, you'll encounter a huge semantic disconnect between the view and the manipulation.

This semantic disconnect has historically been a 'wall' between PL and UI. Even for languages like SmallTalk.

This wall can be addressed by modeling UI actions - navigation, cut and paste, menu selection, etc. - as acts of programming. Let's call the generated program 'E'. Program E formally models a programmer navigating and manipulating a user interface and development environment. Every time the programmer/user does something, it is modeled as a concatenative extension to this program. (With an exception for 'undo', though even that could be modeled explicitly - capabilities to view our history or move into an alternative history.)

An advantage of this design is that we now have a formal, internal meaning for exactly what it means to navigate, to view, to cut, copy, and paste - a formal meaning that will make sense from the perspective of the language rather than interpreted through an external IDE. By avoiding the indirection through an external IDE, the types involved in the operations can be much richer - live signals, capabilities, sealer/unsealer pairs, etc.. Further, the context is much more robust and persistent.

A programmer can develop directly (e.g. extending program E, attaching behaviors to real data-hoses), or indirectly (using program E to compose blocks or describe ASTs that will later be interpreted, without activating them. Mixed-mode systems are certainly possible, e.g. describing a software agent that, if you ever activate it, will have access to a live signal indicating the state of lights in your house. (This is an example of what I earlier described as "an ad-hoc mix of live operations or manipulations, and constructing behaviors to apply later".)

Does your IDE do this? Or is it stuck like most IDEs behind a wall of syntax, with fragile context and crippled manipulations?

Languages typically have richer structure that makes it useful to consider them as languages.

In natural language, the rich structure is semantic, not syntactic. The same is true in the language I'm developing, Awelon. The structure is all there in the type system.

I assumed you had in mind more alignment between the environment and the program, like in a staged programming environment

I do. But a developer who wants to build a program to be applied later can certainly do so within "the program" E. Thus, the scenario I believe you were envisioning is addressed well enough.

What does Direct Semantic Access mean?

Oh really? I wonder if you're interpreting my words differently, or in a far more constrained (or crippled) manner, than I mean them.

Of course you do :)

Let me see if I understand your scenario. I have a process running that's monitoring my house motion sensor and building a history. I have capabilities to control my lights. I have (or want to write) an agent that looks at the history of the sensor and uses it to control the lights.

I would want to support most of this scenario, but I would not support cut/paste of the active process that's building the history. You'd need to leave the monitor process building the history and hook-up a reference to it.

I don't really understand the rest of what you want. I like editing constructs as text (or maybe ADTs directly) and not as live objects.

Also, I've thought about it a little bit more and I don't think I'm going to change my approach regarding UIs as languages. General UI elements don't need to have meaningful language representations. The most general thing I can see making sense with an arbitrary UI is to capture and rewind/replay parts of it. I do think it will often make good sense for there to be a shallow translation from UI interactions into a core application language (as is common with scriptable environments like Word or AutoCAD, etc.), but that's different.

Most UIs and programmable

Most UIs and programmable devices are live objects. I.e. UIs are provided from external services, or encapsulate capabilities you cannot access. So, for my goals of unifying PL with HCI, this is among the problems that matter most. Another relevant feature is a direct or shallow relationship between UI elements and signals/capabilities that are objects in the language. Without that, it's very difficult to programmatically observe and manipulate the things the human can see and manipulate, which is just awful.

The scenario was a signal indicating current state of the lights, along with capabilities from buttons. I imagined, but did not explicate, that you were grabbing these from a UI element (e.g. a form, or a webpage). But your interpretation works well enough. Would you be able to "copy and paste" to get that reference you need? Could you do such a thing with formal meaning? How much extra knowledge would be needed to go from reference to getting your information about lights usage?

Edit: I tend to think this should go without saying, but...

To unify PL and UI, it is essential that the set of computations we can observe or influence are exactly the same whether we do it by hand through UI widgets, or programmatically. Further, the act of programming must be an action on the UI - i.e. directly grabbing a particular input or output field from a widget and integrating it into a software agent, or another widget - such that we don't need special under-the-hood arcane (or proprietary) knowledge to manipulate or observe stuff programmatically.

It is okay to manipulate some artifacts as text or ADTs, so long as this principle is supported. Among other things, any text you observe or manipulate must also be subject to this principle, and thus subject to programmatic observation and manipulation.

When you say "General UI elements don't need to have meaningful language representations," I think your meaning is that you have no such principle. You will lack a consistent means to programmatically extend and manipulate your user interfaces. You seem to be assuming that a common approach to user-interface with programs is sufficient. It really isn't.

The most general thing I can see making sense with an arbitrary UI is to capture and rewind/replay parts of it

I would like to see a future where the 'general' set of things we can do with UI include programmatic manipulation or observation, capturing live fields from one form into another (mashups, transclusion), extending the UI with software agents and new macros, and so on. Adjectives we apply to a 'good PL' such as composable, modular, expressive, extensible, abstraction, simplicity, safety - all apply to a good UI.

Unifying PL and UI means more than using PL as an effective UI; it also means turning UI into a good PL.

I don't understand the

I don't understand the significance of the house light example. I think we're coming at this from different angles, but yes, one of my goals is to have the IDE function as an OS, and manipulating the lights in my house doesn't sound like a particularly challenging example problem. But you also seem to be focusing on some particular copy/paste behavior that I don't understand (e.g. "Could you do such a thing with formal meaning?" -- I don't know what you mean).

Further, the act of programming must be an action on the UI

I don't exactly agree with this. Widgets are made for interaction with people. Programmatically, we usually want to bind to something below this layer, though I will agree it should be just below it. As an example of what I mean, people will enter numbers character-by-character at the keypad, but programmatically we want to pass a number and not a string.

When you say "General UI elements don't need to have meaningful language representations," I think your meaning is that you have no such principle.

It's not clear to me that a command language is necessary for every interaction. At lower levels where the command language would look like "click (100,200); keypress 'A'; ...", it's not clear to me that it's a good idea to think of this as a language. I think I'd rather see things from the viewpoint that every data stream could be interpreted as a language, but then not use that interpretation for cases where it makes little sense.

I'm open to being convinced that it is a good idea to view every UI as corresponding to a language, but I don't see this as a particularly important idea.

You will lack a consistent means to programmatically extend and manipulate your user interfaces.

I think you're assuming too much.

The most general thing I can see making sense with an arbitrary UI is to capture and rewind/replay parts of it

I would like to see programmatic manipulation [...]

My wording was poor, but the context was the UI language. Yes, we can do other things with UIs themselves, but I was asking what we might do with an interaction transcript in the language of an arbitrary UI, such as "click; keypress; etc".

significance of the house

significance of the house light example

The ability to copy-and-paste the capabilities AND signal - e.g. copy from a buttons and label in a 'my smart house' form, paste into a software agent. That was the only significant part of my question. It's also the only part I actually asked you about. I'm not suggesting it's a hard problem; the agent itself should be easy, but it will be easy ONLY if you can easily extract the necessary capabilities and signals from an ad-hoc UI component, and combine them with a sensor capability from someplace else entirely.

Widgets are made for interaction with people. Programmatically, we usually want to bind to something below this layer, though I will agree it should be just below it.

It's fine to have a shallow binding, so long as 'people' can directly get at this binding in order to programmatically use them somewhere else.

At lower levels where the command language would look like "click (100,200); keypress 'A'; ...", it's not clear to me that it's a good idea to think of this as a language.

You seem to be focusing on the 'character by character' aspect. I don't really care whether you do that in some logically intermediate state that gets validated and transformed according to input type (or other declarative rules) underlying an input form.

I was asking what we might do with an interaction transcript in the language of an arbitrary UI; such as "click; keypress; etc".

Access to this is useful if you want to extend or manage the set of macros, shortcuts, prediction, etc. in ways the original designer did not anticipate. But it doesn't mean every single widget must handle it. Though, whatever does handle it should ideally be subject to observation and programmatic manipulation, and thus explicitly part of the environment - so we can develop new macros, predictions and tab completion, memory of history, and other useful features that the original designer did not anticipate.

Discoverability

The ability to copy-and-paste the capabilities AND signal - e.g. copy from a buttons and label in a 'my smart house' form, paste into a software agent. That was the only significant part of my question. It's also the only part I actually asked you about. I'm not suggesting it's a hard problem; the agent itself should be easy, but it will be easy ONLY if you can easily extract the necessary capabilities and signals from an ad-hoc UI component, and combine them with a sensor capability from someplace else entirely.

OK, I think I understand the example problem now. You're worried about discoverability. You have a UI that does something, and without probing into the documentation (arcane knowledge) you want to be able to capture the abilities that it has and use them programmatically somewhere else.

I agree this is an important goal. So you want to have a discoverability protocol for arbitrary UIs. I'm not sure if you're literally suggesting copying/pasting the button as the UI for that (which I don't think is a good idea) or are suggesting something like "view source" followed by copy/paste or a "capture commands" mode for UIs that, instead of running commands, shows you what would have been run (and from which you could copy/paste).

Anyhow, I'm on board with all of that, but I think the latter idea of "capture commands" will work better than "view source", for reasons closely related to the fact that not all UI is coherent as a language. If you try to start "at the top" and ask the UI for "what happens when I press enter", then you're going to find yourself in the plumbing of a text widget. And there might be intermediate modal dialogs, client-side prediction, etc. that would get in the way of the "real" capability layer that you're interested in. What makes more sense to me is to have opt-in protocols at arbitrary levels of abstraction (at the click level if desired) where you can see the commands produced by the UI at that level.

What makes more sense to me

What makes more sense to me is to have opt-in protocols at arbitrary levels of abstraction

Totally agree with that. If good UI should have the properties of good PL, supporting a flexible and appropriate level of abstraction is certainly among them.

But I am probably imagining a different UI structure than you. I tend to think the user-model should be processing all the user's lowest-level signals and certain view-transforms, while the UI 'structure' itself should be higher level and much closer to semantics. I guess this POV evolved from developing a couple multi-user systems, where users need different view-preferences and macros and such.

But I am probably imagining

But I am probably imagining a different UI structure than you.

I think you could prove that in court. It will be interesting to see where you end up.

I don't buy it

I agree that names can have their problems, too. But compared to the alternative, those are luxury problems to have, which I'd trade any day.

Seriously, I don't find these arguments convincing at all, if not totally backwards. I also don't see why multiple parameters/return values are best expressed by stacks, or why the unavoidable abstraction and capability leak you introduce everywhere by allowing everybody to see all their caller's arguments and locals is anything but a horrible misfeature.

But to be honest, I also do not care all that much. As long as I don't have to deal with it, everybody is entitled to enjoy their own masochistic treat. :)

don't see why multiple

don't see why multiple parameters/return values are best expressed by stacks

I haven't argued that multiple return values are "best expressed by stacks". What I would argue is that multiple return values are (1) one of the reasons your analogy to De Bruijn numbers fails, (2) one of the reasons that stack-based languages are effective at refactoring the internals of functions.

the unavoidable abstraction and capability leak you introduce everywhere by allowing everybody to see all their caller's arguments and locals is [..] a horrible misfeature

While FORTH does have this problem, it is not fundamentally difficult to control how much a subprogram sees in concatenative or stack-based programming. Factor, Cat, Kitten, and other stack-based languages make it easy, through use of combinators or types. John Nowak described one such combinator, 'dip', in the first post in this topic.

compared to the alternative, those are luxury problems to have

The biggest problem with this argument is that you've already established your complete (and apparently unabashed) ignorance of "the alternative" under discussion. Your arguments seem to consist of blind prejudice, false assertions (i-th element, unavoidable capability leak), and straw-man analogies (De Bruijn numbers).

I can understand that you're comfortable with names, that any problems they might have are the 'devil you know'. I can also respect that you "do not care all that much" and that you're happy to remain ignorant of alternatives. But, please, if you're going to argue, at least understand the subject.

Work-arounds

I haven't argued that multiple return values are "best expressed by stacks". What I would argue is that multiple return values are (1) one of the reasons your analogy to De Bruijn numbers fails,

Fine, add anonymous tuples with only indexed projection to the mix then. Equally brittle and equally tedious to work with.

(2) one of the reasons that stack-based languages are effective at refactoring the internals of functions.

I fail to see the connection. That's an unfounded claim that neither matches my experience, nor makes any sense to me technically.

While FORTH does have this problem, it is not fundamentally difficult to control how much a subprogram sees in concatenative or stack-based programming. Factor, Cat, Kitten, and other stack-based languages make it easy, through use of combinators or types. John Nowak described one such combinator, 'dip', in the first post in this topic.

Fair enough, I shouldn't have said "unavoidable", but clearly, it's the default, and that default is all wrong. The "good" behaviour should be the norm, not require extra cost and extra work by the programmer---that never works out. I'm surprised to hear you argue otherwise.

The biggest problem with this argument is that you've already established your complete (and apparently unabashed) ignorance of "the alternative" under discussion. Your arguments seem to consist of blind prejudice, false assertions (i-th element, unavoidable capability leak), and straw-man analogies (De Bruijn numbers).

That ad hominem was neither called for nor does it refute what I said. So far, your arguments consisted mostly of hand-waving the advantages and pointing to work-arounds for the disadvantages. Programmer-made work-arounds are okay as long as you buy into the premise that there are (significant) advantages, but I haven't heard a plausible point in case yet.

add anonymous tuples with

add anonymous tuples with only indexed projection to the mix then

Adding tuples won't save your straw-man analogy. Actually, I was already assuming you could model tuples in a system with De Bruijn numbers (using a Church encoding if nothing else). But tuples have very different characteristics from multiple return values on the stack - i.e. the code receiving the tuple cannot be transparent to the tuple's existence, so now you need to introduce extra logic to break apart the tuple and use it.

Also, multiple return values were only one of many reasons listed for why your analogy fails. I'm not sure why you're focusing on it in particular.

(2) one of the reasons that stack-based languages are effective at refactoring the internals of functions.

I fail to see the connection.

The ability to return multiple values onto the stack enables what in applicative code would be the refactoring of local sub-expression variables - i.e. we can transparently factor out the code that dumps a bunch of local variables onto the stack. This works well in conjunction with the ability to remove or process multiple items from the stack.

Relevantly, this is done without introducing a intermediate tuple. A tuple must be explicitly constructed, and explicitly decomposed, both of which add code and undermine the benefits of the factoring.

Often applicative code won't factor out sequences of local variables because it isn't worth the hassle (or performance risk) of building a tuple then breaking it apart. The hassle is made worse because it's very verbose (e.g. passing parameters to the function that builds the tuple, using methods or pattern matching to extract from the tuple) and the savings is often very little.

The "good" behaviour should be the norm, not require extra cost and extra work by the programmer -- that never works out. I'm surprised to hear you argue otherwise.

I'm a big proponent capability security model, robust compositional security (and compositional reasoning in general), path of least resistance, good defaults, and abstractions that don't lie to me. It must seem strange to you, then, that I'm promoting stack-based programming, where it seems the entire environment being accessible is the normal state of things.

Thing is, I'm not. I'm promoting concatenative programming, and tacit programming. The underlying model I favor is based on John Hughes's arrow model. (See my initial post in this sub-thread.) The primitive combinators for arrows are:

(>>>) :: (a ~> b) -> (b ~> c) -> (a ~> c)
first :: (a ~> a') -> (a * b) ~> (a' * b)

In arrows, the 'entire environment' is passed through (>>>) composition operator, which I agree is concerning for security. OTOH, use of 'first' enables cutting off a very precise chunk of the environment for further operations, and is great for security. Also, every use of first is static - i.e. the amount of environment accessed never depends on runtime values. This makes it easy for developers to be aware of how data and authority flows within the language at compile-time.

(Edit: Cat and Kitten get the same 'static' advantage by use of typed functions, albeit at some cost to expressiveness unless they use explicit macros.)

In the concatenative language, structured types such as (a * (b * (c * d))) can be used to model stacks of varying size and type. An argument to 'first' is a static 'block' (function object) placed on a stack, i.e. instead of an applicative parameter. (This is nice for composition, but does result in a Turing complete model.) In a simplistic (non-extensible) environment, this might look like:

[fooAction] first

The blocks syntactically stand out. I think they even look a bit like jail cells. So we do have a syntactic means to easily recognize when code is idiomatically being applied to some subset of the environment.

Of course, Awelon's environment is a bit richer, and developers will use combinators other than 'first' based on how much of the environment they wish to provide - e.g. there might be a special combinator that passes on the first one, two, or three elements on the stack, or developers might create one that passes on those elements plus a couple globals. Developers must either know the combinators if they want to understand how the environment is impacted... or automatic stack visualization would also do the trick.

This turns out to be a major boon for expressiveness. I want security, like an object capability language, but I also want expressiveness and abstractive power. With tacit programming for arrows, I now can get both - without syntactic muss or fuss. Even the problem of 'forwarding a specific set of capabilities and data' can be refactored into user-defined combinators. (Not something easy to do with applicative code.)

Beyond all that, I also have another feature for security, which is static support for sealer/unsealer pairs. I can, when I wish, encapsulate parts of an environment and hide them from any subprogram that lacks the unsealer. Makes parameteric code easy to write. With a decent library, I could also use these to model ADTs or objects.

Edit: I find it baffling that you consider combinators and types a "work around". Should I remind you of such fine examples of applicative languages as C, JavaScript, and Lisp? If I were to guess, I'd think concatenative languages have a much better record for controlling access to environment than applicative languages do (on a percentage of languages basis).

That ad hominem was neither called for nor does it refute what I said.

I had already refuted what you said (De Bruijn false analogy, i-th element, unavoidable capability leak). The slight on you was not part of those refutations, but an observation on the quality and nature of your arguments thus far.

If you think it was unwarranted, I suggest you consider how I might take you implying I'm a "masochist" or involved with concatenative programming for a shallow "sense of achievement". You should also consider how using phrases like "AFAICS" and "my experience" as part of your arguments will place your sight and experience under valid scrutiny.

Programmer-made work-arounds are okay as long as you buy into the premise that there are (significant) advantages, but I haven't heard a plausible point in case yet.

My strategy with you has not been to present advantages, but rather to (attempt to) break you from your misconceptions (i-th element, capability leaks, the De Bruijn analogy, that "Names are an essential abstraction to reduce cognitive load.").

You've already acknowledged a few advantages - such as "I agree that names can have their problems, too", or "low-level syntactic aspects of cutting & pasting code segments". You've even acknowledged the potential expressive power of full environment manipulations, though you did so in terms of security risks and abstraction leaks rather than in terms of constructing DSLs with sentinels and full-stack processing.

I consider those advantages to be valuable; the 'problems' of names are really quite severe, especially for safety-by-construction, which is in part why arrow models or box-and-wire programming have been created. The syntactic challenges of factoring are relevant - without the syntactic hassle, refactoring happens more often and at a finer granularity. The expressiveness of accessing the larger environment is useful for extensible languages and metaprogramming, even if we also address security concerns.

Convincing you of their value seems pointless. You've already indicated your dismissiveness towards such ideas. ("luxury problems to have", "not terribly important to me", "clearly, that default is all wrong").

The syntactic properties of concatenative programming seem advantageous, btw, for fast parsing (almost no syntax), specifying rewrite optimizations (transform sequences of words into new sequences), automatic code generation (program directly corresponds to search path), genetic programming (splicing and recombining linear sequence of code), and hygienic macros (no risk of name capture). I think it's hard to argue the against the technical properties here, but I've only mentioned these so you can dismiss them.

Go ahead. I've got a few guesses about which dismissive tactics you'll use.

The Great Tuple Subthread

I'd side with Andreas here: I don't get your points about tuples.

The ability to return multiple values onto the stack
enables what in applicative code would be the refactoring of local
sub-expression variables - i.e. we can transparently factor out the
code that dumps a bunch of local variables onto the stack. This works
well in conjunction with the ability to remove or process multiple
items from the stack.

I hear you say two things:
(1) it's nice to return multiple values
(2) there is a duality with taking multiple arguments that is important

Scenario for point 1.: I want to combine several producers each
returning a value, and have that combination be a first-class citizen
in the language (no overly heavy syntax is enforced)

Scenario for point 2.:: I want to compose a producer and a consumer,
be able to change them in sync to pass one more value, without having
to touch the code that connects them together.

It seems to me however that both those points are enabled with tuples
in applicative code naming variables. Point (1) would be the return of
a tuple (that can be either immediately decomposed into separate names
by the caller, or kept as a tuple when it is a good
value abstraction). Point (2) would be simply a function call
`consumer(producer)` where the type of `producer` is a tuple.

Relevantly, this is done without introducing a intermediate
tuple. A tuple must be explicitly constructed, and explicitly
decomposed, both of which add code and undermine the benefits of the
factoring.

In my `consumer(producer)` example, there is no syntactic overhead. If
you don't want to manipulate an intermediate tuple, you'll write
something like

let (foo, bar, baz) = producer in
...

I don't think this can be considered as syntactically heavy. We're
introducing names, but there is no syntactic overhead to introduce
several names instead of ones. In fact this is an exact ML-style
counterpart of Lisp's syntax for multiple-bind, multi-value-return --
which means that a ML language *without* a concept of multiple values
subsumes (in theory but also in practice) use cases of multi-value
return -- in applicative languages.

You can add a notion of multiple value return to ML languages: you add
"value types" that have a different kind from the usual single-world
values, which limits the level of abstraction you can use on bunch of
values -- one very strict way to enforce this is to disallow naming
"values" of these value type, which amounts to forcing the expanded
elimintation form `let (foo, bar, baz)` everywhere. You don't gain
anything in expressiveness (only in closeness to some
performance model), so I don't think that would help in this discussion.

Often applicative code won't factor out sequences of local
variables because it isn't worth the hassle (or performance risk) of
building a tuple then breaking it apart. The hassle is made worse
because it's very verbose (e.g. passing parameters to
the function that builds the tuple, using methods or pattern matching
to extract from the tuple) and the savings is often very
little.

I emphatically disagree with your "it's very verbose". ML syntax for
passing a tuple to a function is in fact exactly the syntax for
passing multiple arguments in C-syntax languages, which I don't think
is perceived as verbose. I've already pointed out that the
deconstruction syntax also coincides to the most natural
(and rather light) deconstruction for multi-value return.

Re. De Bruijn indices. They are often used in formalization of core
calculus that don't both with multiple-name binders, so we have less
experience with them (binding theory papers, eg. the "Locally
Nameless" paper, note that they don't handle that case
for simplicity). However, it is extremely natural to extend De Bruijn
indices with multiple binders: just give them consecutive indices.

De Bruijn syntax being nameless, the lambda-abstraction syntax is not
(lambda x t), but simply (lambda t) (implicitly shifts binders and
redefine 0). The single-value let-binding form would be (let e1 e2)
(binds 0 and shifts in e2). The tuple-deconstructing let-binding form
would be (let-tuple e1 e2), where e1 returns a tuple of length
N (or multiple values if you want to read it that way), and indices
0..N-1 are rebound in e2, with the rest shifted by N.

This gives you a light multi-return syntax that, I agree with Andreas,
rather closely corresponds to what happens in concatenative
languages. A corresponding multi-arguments syntax would be
(lambda-tuple t), where you may have to statically note the size of
the input tuple if your type inference is not good enough (it's not
immediate as with let-tuple as you have to look at use sites), or you
find that helps readability -- just like we mentioned explicit type
annotations to limit the access to the callers stack in concatenative
language, as there is no obvious "implicit" information of how values
to take in input.

Tuples don't help the De

Tuples don't help the De Bruijn analogy. That's my point. They don't help because returning multiple values as a tuple is qualitatively different than returning multiple values on the stack, especially with regards to factoring code.

You hand-waved over the issue of operating on an intermediate tuple. For some reason, you even deconstructed a tuple as `(foo,bar,baz)` and suggested it wasn't a manipulation of an intermediate tuple. But this manipulation is important to the qualitative difference. The need for it is why tuples don't help Andreas's case.

A common case is that you want to modify different elements of the data being passed to the consumer. With applicative programming, the code ends up looking like this:

(foo,bar) = producer
baz = makeBaz foo
consumer (baz,42,bar)

With multiple return values on the stack, we don't need to explicitly pick apart any tuples or put them back together.

producer makeBaz 42 swap consumer

Here 'swap' can be considered semantic noise.

What happens in practice - in "well factored code" - is that producers and consumers are arranged so that, when tweaks need to be made, it's almost always on the first outputs and inputs, or first two. (It may seem arbitrary, but such scenarios come up a lot in practice.) Also, there are many cases that (due to careful factoring) end up looking more like this:

(foo,bar) = producer1
qux = producer2
baz = makeBaz 42 qux foo 
consumer(baz,bar)

Which corresponds to concatenative code:

producer1 producer2 42 makeBaz consumer

This sort of code (no stack manipulations at all) happens only by design, but after a while it starts happening often without much thinking about it (like Haskellers designing functions that are easily curried). OTOH, beginners in concatenative programming tend to have messy stacks, and noisy stack manipulations, both of which make it even harder to keep track of a stack. It's a really harsh learning curve compared to use of names, though I think that could also be addressed.

Mucking around foobar examples probably doesn't reveal some of the more common structures that can be factored out of large functions, where applicative code often has difficulty. I find there is often a lot of repeated internal-function structure within or between functions regarding, for example, subsets of else-if-then sequences or boiler-plate intializers. Many factorings that would take macros in other languages can be done directly in concatenative programming.

I emphatically disagree with your "it's very verbose". [..] syntax for passing multiple arguments in C-syntax languages, which I don't think is perceived as verbose.

It is very verbose compared to tacit concatenative programming.

From that perspective, C does the following things that are very verbose:

  • parameter declarations take horizontal and vertical space
  • use descriptive parameters in expression => wide expressions, take lots of horizontal space
  • to reduce horizontal space, assign sub-expression to intermediate named local
  • LHS declaration of intermediate named local also takes horizontal space; prior action inefficient
  • every time you factor code, extract method, you need yet another set of name declarations

Use of descriptive names is especially bad for verbosity. But even non-descriptive names are very verbose compared to tacit concatenative programming.

People comfortable with applicative and unfamiliar with concatenative tend to imagine that the concatenative programming will need almost as many stack manipulations as they use names. This impression might be enforced when they, through inexperience, make a mess of the stack. But in practice (at least for experienced programmers) there are fewer stack manipulations than parameters, and concatenative programmers also avoid many intermediate names and parameter declarations.

the deconstruction syntax also coincides to the most natural (and rather light) deconstruction for multi-value return

Most natural deconstruction? Returning multiple values directly on the stack seems very natural to me. And even if you do return a tuple as a value, I can factor out the code to deconstruct it (i.e. code to take a 3-tuple and turn it into 3 items on the stack).

I appreciate multiple values being returned directly on the stack, since constructing and deconstructing tuples would hinder refactoring.

This gives you a light multi-return syntax that, I agree with Andreas, rather closely corresponds to what happens in concatenative languages.

Even if tuples are "lightweight", you cannot operate on returned tuples as directly as you can multiple returns on the stack. The difference is obvious and significant to anyone who has done much concatenative programming.

Many concatenative languages do have some support for tuples or lists as values. Mine is among them. Multiple values can be returned either directly on the stack, or via tuple. In my experience, multiple returns on the stack is much better for factoring, while a tuple or list is useful when I want to treat structured objects (e.g. a complex number) as a single value.

The "good" behaviour

The "good" behaviour should be the norm, not require extra cost and extra work by the programmer---that never works out. I'm surprised to hear you argue otherwise.

I don’t believe he was arguing otherwise. You can have a stack-based language that only allows static/safe effects, and you can have a concatenative language that’s not based on stacks: arrows, queues, and graphs are all effective models. In addition, Kitten and Cat are type-inferred, so there is no burden on the programmer to ensure that a function’s effects are localised. Kitten’s stack is simply an efficient way of mapping its semantics onto real hardware. And inference is not free, of course, but its value more than outweighs its cost.

stack-based languages are effective at refactoring the internals of functions.

I fail to see the connection. That's an unfounded claim that neither matches my experience, nor makes any sense to me technically.

The refactoring abilities of a concatenative language are very general, which makes them difficult to describe both succinctly and convincingly. The situation is similar with monads, where you have a proliferation of analogies with burritos and spacesuits and who knows what else, for what is essentially a very simple concept.

The first example we usually trot out is that it’s very easy to “extract method” in a concatenative language—cut, paste, name. But the key point is that unlike in Joe’s Imperative Language, this is something you do all the time because it is so easy. Definitions rarely exceed a few lines of code.

It’s also not just about strings of the same function calls, but sequences which can be arbitrarily interleaved with calls to other functions. They describe a DAG of data and control flow—factoring is what lets you collapse arbitrary subgraphs into individual nodes, reducing program complexity in a very quantifiable way.

dup for example is the pattern of sending the same data to multiple places:

var x = f();
g(x);
return h(x);
In Kitten, with locals:
f ->x
x g
x h
Without:
f dup g h
As a DAG, dup is simply a bit of program wiring that does this:
 |
/ \

At any point where your program does that, you can use dup because the puzzle piece fits, by its shape and more formally its type. And there are many more useful stack combinators which give us a common language for the shapes of subprograms, in the same way that folds give us a common language for subcomputations with the power of an FSA.

That is the key insight: concatenative programming is about abstracting and composing flow structures.

f dup g swap h

I'm pretty sure your `f dup g h` code was an incorrect translation. This serves as a fine example of the risks of in-the-head visualization. Hope you don't mind. :)

Assuming f and g each return one item, the correct code would be:

f dup g swap h

Automatic visualization of inferred types, and live programming where we can continuously see the outputs for a sample input, are technologies could help a great deal for all programming models.

But I think they'll help a great deal more for tacit concatenative programming.

My hypothesis is that named locals will interfere with automatic visualization - due to repetition of names, verbose code, lack of a clear 'environment' type, jumpy data-flow via names. Parameters and local variables are simultaneously a "poor man's" visualization tool, a verbose and semantically dubious dataflow representation, and an impediment to automated visualization tools.

Thanks

Changed the example. My excuse is that it’s very hot today and my brain is not operating at peak efficiency. ;)

Thanks :)

Thanks for proving my very point though. :) Despite dmbarbour's remarkable attempt to spin it in a different direction: obviously, the named form is trivial to get right and trivial to read, while the stack-only form is neither -- even for such a trivial example, and even for somebody as experienced as you.

As for flow abstractions: I keep hearing that, but I don't get the argument. Applicative languages can abstract flow just as well -- but they don't necessitate it. They provide more high-level structure than a pure stack language, in that they can express tree and DAG-like flow directly, instead of having you encode everything down into sequences. You use sequentialised flow (and define abstractions like arrows) only where that is a suitable abstraction. That seems like a clear win to me.

Non-trivial variable plumbing

In my experience, algebra is full of examples of variable plumbing that are hard to express in nameless languages. The example that tricked Jon exposed non-linearity, and this difficult. Non-local effects where the same variable is needed in distant part of a computation is another. A list of examples:

scalar_product (a1,b1) (a2,b2) = a1*a2 + b1*b2
linear, local, easy in stack languages

norm2 (a,b) = a*a + b*b
non-linear, local, not too difficult

complex_product (a1,b1) (a2,b2) = (a1*a2-b1*b2, a1*b2 + b1*a2)
non-linear, non-local, difficult

Getting the nameless form of those is often enlightening (eg. you can get expression that scales better to adding a third dimension), just like Bird-Merteens squiggol-ish expressions of familiar operations have a certain hieratic beauty. One may argue that we're biased in favor the nameful syntax due to the existing practices of domain experts, though on a small scale the plumbing often looks like something we want the compiler to do for us (... by using names).

This another aspect of

This another aspect of programming languages that is psychology instead of mathematics. Are names an accident of history or inherently easier for humans to understand? I think the latter but even those who disagree I hope will agree that in our current culture, code with names is much easier to express and understand. Even stack based languages use names for their operators. You could imagine a stack based language without names at all, where you hold the functions that you want to call on the stack just like normal data, and then you use a CALL operator to call them (in the same way that functions are ordinary data in functional programming languages). That would be truly horrible to program in, since even getting to a primitive operator like + would require stack manipulation.

psychology instead of

psychology instead of mathematics. Are names an accident of history or inherently easier for humans to understand?

We should be asking: under which conditions is it easier for humans to understand and use names?

Do you use names for the clothes in your closet? Do your shoes have names? your legs? What about the last apple you ate? Your car keys? Your car? If names offered an easy way for humans to understand things, then why don't humans use them more often?

My observation: We use names primarily to identify distant people, important places, and important things. For other distant items, we're more likely to use identifying descriptions along with general location. For nearby items, we're more likely to communicate by pointing or glancing, leveraging our visual and spatial abilities.

Humans have a very advanced visual processing and spatial memory capabilities, and we are decenly skilled at structural manipulations. Further, this seems to be our preferred modus operandi for the bulk of interactions with our environments and a fair portion of our communication with other humans.

Shouldn't programming be the same way?

Even stack based languages use names for their operators.

Sure. It's only naming local data (parameters, local variables) that is under scrutiny.

For nearby items, we're more

For nearby items, we're more likely to communicate by pointing or glancing, leveraging our visual and spatial abilities.

So you see concatenative dip trip flip expressions as the equivalent of pointing? A closer analogy to me would be a conversational style where the participants are supposed to keep track of a stack of objects that they're discussing and every conversational phrase rearranges the stack. :) It seems to me that you're complaining that humans aren't good at names and then advancing as a solution something at which they're even worse.

And to then argue that the IDE is going to be playing this mini-game for us, so it doesn't matter -- well, then why introduce it at all? Why not just say that your syntax is a graph, hooked up by pointing, and for the interface pick something stable. For example, have the UI introduce names (a, b, c, ...) for locals or parameters, and then reuse them as they're consumed. Maybe have a syntax for using a symbol without consuming it for explicit duping.

Or, you could look at that scheme and decide that it's better to let programmers introduce names for those symbols because they can pick mnemonically superior symbols. And that dup detection is best handled by the IDE. And then you're back at the system everyone else uses, except that you have a nice graph presentation of the term that doesn't overemphasize the binding structure. (And you can still forbid implicit shadowing!)

Metaprogramming, Visualization, and Problematic Names

then why introduce it at all?

One reason is metaprogramming.

Graphical PLs tend to become unwieldy at scale. Humans can handle some of the "in-the-small" tough problems by pointing, but many of the in-the-large ones should be addressed automatically. I need a formal model of the environment and operations on it to model and understand interactions at the larger scale.

Anything the human can do by navigating, pointing, and manipulating, must have names so the programmer can automate the acts of navigating, pointing, and manipulating. (This is what distinguishes the programmer-model from the real programmer.)

Another reason is visualization.

Visualization is easy to address for tacit concatenative programming. I have a well-typed environment at any given point, and I can visualize that type over space. I have a simple linear model for operations on the environment, which is easy to treat as time. I have well-typed dataflows, and I can thus easily trace or animate the movement of data.

As I've said before, tacit concatenative code has both much more to gain from automatic visualization, and much greater potential for automatic visualization, compared to name-based code.

Also, names have problems.

The dataflows using names are implicit, and not explicitly typed. Names can be used in many unsafe or inefficient ways. It becomes non-trivial to analyze for safe uses of substructural types, or unsafe/inefficient name-capture in staged or distributed programs. Even if a computer can analyze for them, humans can have trouble understanding the issues.

Names effectively teach bad intuitions about reference vs. referent and valid dataflow. The mini-game, by comparison, would teach valid intuitions... and this can be directly extended to dependent types, in a form much akin to assisted theorem proving.

Name-based code seems to be relatively verbose, lacks any way to refactor out the basic declarations and uses of names. I believe that the pervasive use of names actually becomes a form of semantic noise that interferes with understanding of the code. Not as much as trying to visualize dataflow in one's head without automatic support, of course. But the abundant proliferation of names for local data and parameters can get in the way of reading and understanding behavior.

And then, of course, there are other psychological problems, like decision fatigue, how names really aren't something humans naturally use in-the-small, and accidental shadowing.

Sure, some of these problems can be addressed or patched over. But avoiding names is a simple solution to the problems of names, and thus one that can be exercised consistently.

...

I believe we can obtain a better programming experience by marginalizing use of names, and formally modeling (e.g. via staged metaprogramming) the names that remain. But support of the IDE is the other essential ingredient to that experience. If names are out of the way, we can replace them - with better visualization, better automation, better composition, better code generation and factoring, better safety-by-construction properties, better intuitions. Potentially better UI/PL integration.

How is this possible?

I don't understand how names could have serious semantics disadvantages in any of these categories when there is a straightforward translation from names to concatenative style. That such a shallow translation is possible to me implies that the use of names is just a syntax/user interface issue. Further evidence is that you're talking about using names for non-local reference and there isn't much of a semantic difference between local and non-local. Unless you meant that names were going to be treated as first class references in your non-local scheme, in which case... now you have two problems :).

So you seem to be conceding that concatenative style isn't the best from an HCI standpoint, but I'd argue that HCI is exactly what you need to be worried about when picking a surface level syntax, which is all I see this as.

how could names have serious

how could names have serious semantics disadvantages in any of these categories when there is a straightforward translation from names to concatenative style?

There is no simple translation.

Not for names as they are used in applicative programming.

Unless you meant that names were going to be treated as first class references in your non-local scheme, in which case... now you have two problems :).

Yes, that's what I've been saying. We formally distinguish reference and referent, and we formally model the relationship between them. In blunt terms: we use an association list, probably with text as the key. We can take or copy objects from this list, or add items to it.

You say this is "two problems". I disagree. It's at least three problems: reference, referent, AND the mapping between them. By modeling all these explicitly, we can use names safely and robustly, even in staged, distributed, or heterogenous systems... but we lose some of the syntactic convenience and equivalence properties. In some cases - e.g. for singleton objects or configuration variables, names that are relatively stable - the convenience of names for data-plumbing is worth trading for a few other problems.

In many other cases - e.g. for parameters and local variables, or keyword arguments - it often isn't worth it. Sometimes it is - I'd use keyword arguments if I had an extensible set of parameters with potential default values. But names shouldn't be so pervasive.

you seem to be conceding that concatenative style isn't the best from an HCI standpoint

I won't argue it's "the best", but I think it's a significant improvement over pervasive use of names. Concatenative style, and marginalizing use of names, is makes many powerful HCI features feasible. Therefore, it is better from an HCI standpoint.

Even when names are modeled in the environment, it's useful for them to be explicitly modeled for HCI purposes, e.g. so we can clearly visualize the linear nature of referents separate from references.

HCI is exactly what you need to be worried about when picking a surface level syntax, which is all I see this as.

That's what I'm doing. I'm choosing an HCI and surface syntax that doesn't come with numerous, built-in problems with semantics, verbosity, intuition, fatigue, refactoring, visualization, and automation. I'm choosing an UI that can more naturally fit the human, with regards to many local and environment manipulations not requiring names.

Do you honestly think that names are the best we can possibly do? (If so, could you possibly defend that position in an argument?) How can we make programming simpler, more robust, and more scalable if we're patching over leaky abstractions like the untyped dataflow of names?

Conversations with a Stack?

So you see concatenative dip trip flip expressions as the equivalent of pointing? A closer analogy to me would be a conversational style where the participants are supposed to keep track of a stack of objects that they're discussing and every conversational phrase rearranges the stack. :)

Not quite. Pointing is something we do to communicate to intelligent beings that have similar visual and spatial capabilities as our own.

Yet, when we interact with our environments, we also do not seem to think in terms of names. (This is what I meant when I said: "this seems to be our preferred modus operandi for the bulk of interactions with our environments").

These interactions include navigation, shifting stuff around, opening and closing, scanning surfaces, plugging elements together. Much of the 'dip trip flip' operations correspond to these actions, not to a conversational style between Q. Swap, roll, apply, some zipper navigations, take and put with the 'hand', content-based searches - these are a very good fit for our visual and spatial operations.

It doesn't take much experience to think of these operations in spatial terms. And I imagine it would take much less in presence of continuous visual feedback.

Your attempt at an analogy to a conversational style with multiple participants would only seem relevant in a multi-programmer environment, where multiple programmers are taking turns controlling a structure. That might be interesting as a means to model some protocols (contracts, trades, games) but I grant that it could be difficult to track. Visualization would help a lot, of course.

But, for the most part, even in a multi-programmer system, I imagine a single programmer would de-facto have exclusive control of a territory and relatively few encounters of that sort.

OK, so you are arguing that

OK, so you are arguing that dip trip flip is a reasonable HCI form factor for a programmer looking at a visualization of the stack.

I can almost believe that for the programmer writing the code, but when you're reading the code surely those words are impenetrable unless they are annotated onto a graph showing what's going on. In which case, the graph is your source of understanding when reading and the words were only really useful for building the graph.

I guess that makes the situation similar to the Coq Ltac language. Again, I can believe that such an interface is useful for expert level interaction, but the transcript of such interaction doesn't seem very interesting. I think it's a bigger problem with Coq where things very quickly become too large to visualize at once and the Ltac history is just kind of a time-index into the evolving proof environment state. Maybe with the added benefit that you'll remember and recognize the labels for some of the steps.

Better for reading, too

when you're reading the code surely those words are impenetrable unless they are annotated onto a graph showing what's going on

Visualization (and preferably some interaction) is also important for reading the code, understanding its meaning or behavior in context and with a variety of inputs. There are many ways to render functions, and live programs can help.

I'm not sure what you are imaging when you say "annotated onto a graph", but concatenative code has a rather nice property of looking just like a timeline. There are many ways to visualize time - before and after pictures, an image for each word in a concatenative program, animations. Objects can be highlighted as they're manipulated, perhaps glow warm and cool down for a while. We could even animate a character or disembodied hand wandering around to perform actions. If readers wish to trace certain objects or areas, it would not be hard to tag objects in the visualization, highlighting them. I believe the precise nature of dataflow would make objects easier to follow than with names.

I think applicative code would severely interfere with many rich visualizations. There is no clear environment type to render. The dataflow is implicit and jumpy. Aliasing and linearity are harder to represent. The code itself is more verbose, leaving less space for a visualization canvas. Names the programmer doesn't care about are semantic noise that is difficult remove from the visualization.

The primary advantage applicative code seems to have for reading code seems to be in the absence of automatic visualization - i.e. for a static page of code, visualizing in our minds, using the names as visual cues. For reading code, names are a poor man's visualization tool.

the transcript of such interaction doesn't seem very interesting

It's just as interesting as "the code" in any program. Which is to say: it's mostly interesting to someone who is trying to understand the program, learn from it, or debug it.

things very quickly become too large to visualize at once

True. Naturally, we'd want some ability to scope what we're looking at, both in space (parts of the environment - places or things) and time (volumes of operations). I don't see any inherent difficulty with this.

I think that one reason Coq becomes harder to follow is that it implicitly uses search rather than deterministic operations that (in some cases) might explicitly encode search. Another reason is that the visualization for CoqIDE seems text than graph, doesn't make good use of color and placement.

Visualization is also

Visualization is also important for reading the code, understanding its meaning or behavior in context. I'm not sure what you are imaging when you say "annotated onto a graph", but concatenative code has a rather nice property of looking just like a timeline.

Do you have something promising along these lines? Point-free code tends to be concise and read left-to-right, so I was thinking a simple IDE could just display the stack type top-to-bottom under each value/operator. Something like:

sum = foldr         (+)        0
      ['a->'b->'b]  ['a]       ['a list]
      ['b]          ['a list]  ['a]
      ['a list]     ['a]
      ['b]

Top-to-bottom displays expected stack values, left-to-right describes stack transformations. Reading off the last one is the inferred type of the operator you're defining. -> is used to describe higher order operators in this context.

Representing part of the

Representing part of the environment object under each word is one potential visualization I was considering. I'd use more colors, shapes, concrete values; fewer symbols. I can imagine hovering over an object in one of these representations to see how it flows through the others (drawing a trace arrow). I have several other visualizations in mind.

Bret Victor's "Drawing Dynamic Visualizations" seems appropriately related.

I imagine a programming environment that puts the concatenative code itself more in the background. Not hidden from the programmer, so much as sitting in a smaller portion of the screen than the environment/process representations and any 'live' programming examples.

I think timeline/animation on an environment can work very well. Developers can visually see the dataflows involved with accessing global configuration values or singletons in the extended environment; it becomes trivial to understand how code is entangled, or to follow important objects (e.g. linear or relevant types, global objects, etc.) to see how they are used through the environment. I think animation will be a very big part of understanding.

Many of my ideas are specific to staged programming with the arrows model. I'm not sure how well they can generalize to stack based languages that perform side-effects or observe runtime inputs. A consequence is that I can often focus on concrete values that are available to the compile-time, and I don't need to deal as much with homogeneous typed lists and so on (all compile-time lists in Awelon are heterogeneously typed, as are all zippers, etc..).

on a small scale the

on a small scale the plumbing often looks like something we want the compiler to do for us (... by using names)

The alternative I'd like to pursue is that "tough" small-scale plumbing becomes something we have our IDE help us with, using PL that can have graphical elements where convenient. I.e. manipulate objects in the stack visualization and generate code to perform said mamanipulations. I think this would naturally appeal to the human spatial/visual capabilities, far more than using names. It should also teach better intuitions about how data and objects can be manipulated compared to guessing at what the compiler does.

I envision a VR/AR programming environment. I sure wouldn't want to use names to access and manipulate most things in such an environment. I need a formal concept of moving objects in such an environment, and I want safety by construction. So tacit concatenative programming seems a more promising formal basis, even if I model names for some purposes (like different workspaces).

Many stack languages do support local names (Kitten and Factor, for example). I can't find the ref, but I recall a note indicating that 'as of this writing' Factor used named locals for around 308/38k words (less than 1%) in its internal codebases (optimizers, IDE, etc.). I can model local names in Awelon using metaprogramming, but I can't imagine them being used much more often than that.

I might be tempted for your complex_product example, if I wasn't able to use a little visualization and code gen. More likely I would break it into two linear problems.

one way to do it is often better than two

I'm not here for "spin" or sophistry. My interest is developing a better programming experience. I've never argued that concatenative code is easy to read in the absence of visualization. But if that's your "very point", I believe it is is easily addressed with technology.

Applicative languages can abstract flow just as well -- but they don't necessitate it

What I've found is that mixed-mode ad-hoc applicative with compositional dataflow is a really awful programming experience. In context of the arrows model in Haskell, there are two cases where applicative code is often used:

  1. some operators (e.g. arr, first) take applicative parameters. The consequence was that those particular inputs could not be abstracted in the arrow itself
  2. in some cases, I want to perform a little static metaprogramming, e.g. repeat an arrow N times using a parameter N rather than repeating by hand. Again, this would take applicative code.

With these two influences are combined, they both contribute to a lot of ad-hoc dataflow through the applicative model. Essentially, I would be abstracting "parameterized configurations" of components.

The problem was that this dataflow was not very compositional. The ad-hoc dataflow becomes a glue globbing everything in a parameterized configuration together. It was difficult to factor components, and conversely it was often challenging to ensure that parameters to different subcomponents are related in sensible ways.

I found I really wanted the ability to parameterize via the dataflow, such that I can separate the site where a parameter is constructed from the site where it is applied in a compositional manner. When I came up with the idea of modeling 'static' type parameters within the arrow itself (thus representing a staged computation), that proved suitable to solve the problem.

It also inherently resulted in a staged concatenative language.

My own experience reminded me that more choice isn't necessarily better. In this case, it seems that shoving everything into the dataflow is both conceptually simpler and much more compositional. There are also fewer things to learn - there is only one way to do it.

they can express tree and DAG-like flow directly

Arguably, the only direct expression of a tree and DAG-like flow is to literally draw a tree - i.e. as done in many graphical PLs.

Applicative code does enable ad-hoc dataflow by use of names, though. Each use of a name can be considered a 'goto' for data. This makes them convenient, but also makes it difficult to structurally enforce safety properties that might have been protected in a more direct representation.

instead of having you encode everything down into sequences [..] use sequentialised flow (and define abstractions like arrows) only where that is a suitable abstraction.

I'm finding that it's always a suitable abstraction. Encoding operations in sequences is actually quite pleasant:

  • To encode operations on lists or tree-like structures, we can use zippers, lenses, folds, traversals.
  • Zippers are close to physical intuition of editing and navigating a document with a finger or cursor.
  • very concise, since we don't have the overhead of naming things or referring to things by names when they are within easy reach
  • good fit for relatively linear media like text; composition in syntax in addition to semantics, results in easy syntax-layer factoring
  • good fit for safety-by-construction, since we have a more precise model and syntactic representation for every dataflow, copy, and drop operation on data

And then there are indirect advantages, since tacit concatenative programming also results in easier parsers, compilers, optimizers, code generators, and so on.

My own experience reminded

My own experience reminded me that more choice isn't necessarily better. In this case, it seems that shoving everything into the dataflow is both conceptually simpler and much more compositional. There are also fewer things to learn - there is only one way to do it.

On that note.

As for concatenative vs. non, I think names are good for configuration and documentation. But as Ruby on Rails popularized, convention over configuration is often superior. Concatenative style is the ultimate convention, which offers tremendous flexibility by reducing the language primitives to the minimal set required, and as the SF-calculus showed, it allows trivially extending the language in quite novel ways, including with metacircularity.

The loss of documentation is a real downer though, and debuggers are very geared towards inspecting named variables, but if you're going to augment the default programming experience with an IDE that provides whatever concatenative programming typically loses, then it might work out just fine.

I think the long history of Forth, and the growth of Factor has demonstrated that usability isn't the real barrier for such languages, any more than usability was a barrier to the adoption of ML.

SF-calculus showed, it

SF-calculus showed, it allows trivially extending the language in quite novel ways, including with metacircularity.

I think attributing this to concatenative style or a lack of names is probably wrong, but I don't know enough about the SF-calculus to say that definitively.

I think the long history of Forth, and the growth of Factor has demonstrated that usability isn't the real barrier for such languages,

That a barrier isn't insurmountable doesn't mean it isn't a barrier. Do you think the paranthesis soup ever stopped anyone from using Lisp?

I think attributing this to

I think attributing this to concatenative style or a lack of names is probably wrong, but I don't know enough about the SF-calculus to say that definitively.

Using the SF-calculus trick to reflect at the level of lambdas and binders is an open problem, according to the recent papers I've been reviewing. I wasn't claiming that you can only reflect concatenatively, merely that the inherent simplicity and flexibility of the primitives of concatenative programming made it "easy". It doesn't seem so easy once you have binders.

That a barrier isn't insurmountable doesn't mean it isn't a barrier. Do you think the paranthesis soup ever stopped anyone from using Lisp?

Absolutely, but that's neither here nor there, because regardless of any barriers syntax and semantics may pose, what matters is the expressive power of the paradigm (seems very high), and how maintenance scales (unclear, which is where David will likely have his toughest fight).

As one hypothesis: if the lack of names poses a serious burden on the human mind, then that will encourage writing even smaller named functions, which in concatenative programming will probably end up being quite reusable. The deficiency of names may thus end up making programs more modular and reusable.

Perception of

The ability to refactor code easily means it happens often. Small words, with descriptive names, often make it easy to "guess the intention" of the code that uses them. Coupled with some visualization and animation (which is available due to the staged programming model), I think maintenance won't be a problem.

But perception of maintenance being a problem will probably be a problem for a long time. People seem to panic every time you take something away (goto, pervasive state, ambient authority, local state, events, names, etc.), even if you replace it with something better. I wonder if people might accept it easier if they're first exposed to the graphical programming, with the concatenative code generated in the background - available for investigation and learning by osmosis.

The concatenative code would then help for environment extensions, metaprogramming, pattern detection (programming by example and a few code tweaks), automatic refactoring, recognition, rewriting, replay...

People need to start concrete and start small. Concatenative has the potential to make it scale.

concatenative the ultimate

Concatenative style is the ultimate convention, which offers tremendous flexibility by reducing the language primitives to the minimal set required

I was quite surprised at just how flexible these languages are - easily competitive with a pattern calculus or vau calculus, due to the direct access to the environment, but it also seems easier to control access to the environment. And the metaprogramming is naturally hygienic.

I had only done a little concatenative programming before starting mine. Initially, I was just going for concatenative byte-code for RDP (software components, agents, distribution language). But its simplicity and expressiveness surprised me, so I lifted it into a full language. Then I developed the programmer-model idea, which can address many difficulties I was having for AR/VR programming environments. This is taking me in some exciting directions.

So, yeah: I agree that these languages offer a great deal of power by convention.

kittens

if you have two functions of type (r a b c → r d), then you can copy the top three elements of the stack, run those two operations concurrently, join on both, and return two values of type d

The concurrency model you're suggesting would be very inexpressive. Consider instead introducing promise or future support, such that when you spin up an (r a b c → r d) computation, the 'd' is immediately left on the stack as a future and you only wait for it if you try to observe it in a significant manner.

Another good approach is to add 'channel' support: create a bounded-buffer channel for one-to-one communication, communicate through it by pushing messages or waiting on them. (A future can be modeled as a single-message channel.)

Regardless, spinning up a fixed number of threads then joining? You can do much better than that.

stack-like nature can enable use similar to lexically scoped variables, assuming disciplined developers

Never assume that.

I suppose some metaprogramming could also support the convention, i.e. use word `{` to create a new scope-object on the current stack, use `var` to add words to it, and `}` to end the scope and try to drop the variables (type error if any types we don't know how to drop).

Good points

Consider instead introducing promise or future support

As long as the only observable difference is performance, then that is probably the better approach.

Another good approach is to add 'channel' support: create a bounded-buffer channel for one-to-one communication, communicate through it by pushing messages or waiting on them. (A future can be modeled as a single-message channel.)

Channels are just a generalisation of mutable references (with a buffer size of n instead of 1). Perfectly fine concept—doesn’t mesh well with Kitten.

Regardless, spinning up a fixed number of threads then joining? You can do much better than that.

You’re right. I was wooed by the the conceptual simplicity. A par word forks green threads, and a scheduler keeps OS/hardware threads busy:

def parmap:
  ->{ xs f }
  option (xs last): ->x
    {xs init f map} {x f@} par append
  else:
    []

I’m sure the equivalents with futures or channels can be equally nice or nicer, though.

From the user's POV, one of

From the user's POV, one of the big issues of 'parmap' is that it isn't compositional unless it runs in parallel with the calling thread: there's a huge performance difference between sequentially applying 'parmap' on two vectors of ten items vs. first gathering those into one vector of twenty items. Many operations that are conceptually commutative or associative will not behave that way with regards to performance.

This distinction, if it exists, will cause headaches as developers stubbornly collect those vectors... and again, as they carefully restore them. We don't want 'conceptual simplicity' that leads to Ptolemaic epicycles. :)

Use of futures, however, is compositional in this sense. Developers can create compositional parallel strategies. They also gain more expressiveness, e.g. by passing the promises of some computations as arguments to others, or explicitly waiting for certain computations to complete before starting others.

Channels are just a generalisation of mutable references. Perfectly fine concept—doesn’t mesh well with Kitten.

References can work well in pure languages if given linear types or bound to a linear structure like an ST monad. Additionally, global variables would not be difficult to model in terms of a stack carrying a structurally typed record. I think you could make such things play nicely with Kitten. The real question is whether you want to allow your kitten near such things. ;)

par vs. fut

From the user's POV, one of the big issues of 'parmap' is that it isn't compositional unless it runs in parallel with the calling thread

Right, par would not allow the lifetime of a thread to implicitly escape the scope in which it was forked. A useful restriction to offer but maybe a silly one to impose.

there's a huge performance difference between sequentially applying 'parmap' on two vectors of ten items vs. first gathering those into one vector of twenty items

You could evaluate the two parmap calls in a par as well. par just enables concurrency; the scheduler manages parallelism.

Could a fut combinator take one function and evaluate it as though it were a par whose other argument is the current continuation?

The real question is whether you want to allow your kitten near such things. ;)

Maybe when it’s a little older. :)

You could evaluate the two

You could evaluate the two parmap calls in a par as well.

That wouldn't address the issue I'm trying to describe. It's the act of gathering of inputs to a call-site (and subsequent distribution of results) that fails to compose. It doesn't matter whether the call itself is 'par' or 'parmap'. The reason 'futures' help is they abrogate the need to gather everything you want parallelized to one site: you can start futures one at a time or a hundred at a time, it results in the same potential for parallelism.

Could a fut combinator take one function and evaluate it as though it were a par whose other argument is the current continuation?

[edited] I suppose you could think about it that way, though it isn't clear to me what you gain by doing so. An important point is that futures also need some kind of 'future' data structure to serve as a synchronization point. The par you described doesn't have this because it joins immediately. Basically, the 'Future x' type could be implicit or explicit, but must exist. (I favor explicit, but implicit is nice for transparent parallelism.)

Gotcha

The par you described doesn't have this because it joins immediately.

Of course! I knew there was something wrong about my thinking there.

So a simple semi-explicit implementation comes to mind: fut takes a function, applies all its arguments, and gives you back a nullary function of the same result type. Calling this either blocks until the computation is finished, or retrieves the already computed result—which one is invisible to you as a caller. I just wonder how much value there is in having a separate wait operation. Effectful futures are also a little scary.

Effectful futures are also a

Effectful futures are also a little scary.

It might be best to avoid those to start, if you can typefully distinguish pure functions from impure ones. It's easier to add a safe way to address concurrent effects later, than to take away the ability to cause unsafe effects. Also, you can potentially use a cheaper implementation - or perhaps leverage resources you otherwise couldn't (like a cloud or GPU) - if you know you won't need effects. Meanwhile, pure futures would have real value for parallel computation.

An explicit 'wait' operation can be nice if you want to make it syntactically obvious in the source code when you might wait. But I suppose that is always the case with functions, right? You might not gain anything by it.

Related: you might also provide a `lazy` combinator that takes a pure function (and its arguments) then returns a memoized function. Might as well complete the set. :)

I do not understand what I am supposed to gain here

From reading the original blog entry and the articles here, I really do not get what I am supposed to gain with concatenative programming. When I played around with Forth a long time ago, I would break my code into multiple lines with comments on each marking the current state of the stack at the end of the line, until I eventually just gave up on that and used local variables for anything that is not to be consumed as quickly as it is put on the stack. (Without the comments, I found the code nigh-impossible to modify after the fact.) These days, working in Haskell, I tend to find more elaborate point-free notation beyond the likes of f . g . h painful to try to understand, much preferring an applicative style of programming for beyond the simplest code. And yes, while concatenative programming may make returning multiple values trivial, what really is the problem with tuples?

Conversely, I am not convinced by the supposed disadvantages of applicative programming. So what if many virtual machine implementations themselves are based on stacks and not ASTs - that is merely an implementation detail that I would prefer to not have to think about. And I prefer to give name bindings descriptive names beyond trivial cases (where then I do often use names like x, xs, n, f, g, etc.), as I am much more likely to remember what they are for when I come back to the code later (as I have found code where I did not take the time to come up with descriptive names difficult to understand when I came back to it after an extended period of times). In comparison, concatenative programming means no names to tell one what values are what in the first place! (Hence the necessity of extra comments for that purpose, and if one needs comments, what is one gaining here?)

Concatenative.org lists such

Concatenative.org lists such properties as concision, simple syntax and semantics, pipeline style, easy metaprogramming. People who stick with concatenative programming past the initial rejection stage tend to speak very highly of it. But I think there isn't any one thing that anyone would say is the 'killer app' of concatenative. It's a bunch of little (but important) things.

My favorite features are probably the easy refactoring of function internals, the ability to keep my environment clean, the emphasis on composition. I also like safety-by-construction, which is a hard feature to achieve using names (names require too much analysis, especially with substructural types or aliasing), and there are some nice applications of concatenative code for automatic code generation, genetic programming, and the like.

But one thing I dislike is trying to keep stack states in my head. I've gotten decent at it - up to three items on the stack and three in my hand (a second stack) is enough for many functions without factoring - but I really think this is something that could be and should be automated by an IDE. My intuition is that the difference will be like moving from a line editor to a text editor.

I mean, I write text all the time without naming each line or paragraph, and without keeping line numbers in my head. That's what programming should be like.

concatenative programming means no names to tell one what values are what in the first place! (Hence the necessity of extra comments for that purpose, and if one needs comments, what is one gaining here?)

Are you suggesting that applicative code avoids the need for 'extra comments'? I was under the impression that the general consensus is that parameter names are not sufficient documentation.

Anyhow, documentation certainly doesn't require names. It is often sufficient to describe the role of inputs, or sometimes to provide a small example program like: `12 4 div = 3` or `"text" [action] newButton`.

Also, if we avoid names for parameters and locals, what's left is behavior. With good word choice, subprograms can often suggest their intention to any reader. I.e. instead of `f . g . h` you might have `textToLines [wordCount] map listAverage`. Even if you don't know it's correct, you can probably guess what it should do. Since many functions are factored into smaller subprograms that each fit on one or two lines in concatenative languages, this "guess its meaning" approach often works well in conjunction with a little extra documentation.

Applicative code with descriptive names tends to become very bloated due to the cycle of names being used in large horizontal sub-expressions and sub-expressions being assigned to newly declared variables. IMO, it's also easy to get lost in all the syntax, and method names fail to dominate unless the IDE bolds them or something. It is really hard to write good self-documenting code using names for data - it focuses too much attention on the ingredients rather than the process.