LtU Forum

A case study of concatenative v.s. applicative syntax design

I implemented a language a while ago.

Based on some feedback, I change it's syntax from concatenative:

- Stack-based like forth.
- Simple and shorter code.
- Easy to do refactoring.

to applicative:

- JavaScript-like syntax.
- Using pattern matching to define interaction rule.
- Can use currying, because we have explicit parentheses.

But I am still not sure which is better.

What do you think?

Sorting the travelling salesman problem

Hi There,

Perhaps someone here might be kind enough to check out a paper I have published at [1] and provide some feedback.

It is an analysis of the travelling salesman problem and the sorting problem, examining the topologies of their solution spaces.


Thank you.

Context Sensitivity and relational comparison operators

I'm elbows-deep in a language implementation, and since (for a new experience) it's not some kind of LISP, I'm dealing with infix operators and implicit operator precedence. I solved a little problem and thought you all might find my rambling thoughts about it interesting.

I noticed that treating relational comparison operators strictly as context-free binary operators matches the way they are in many programming languages but does not match the way they're used in mathematics (and some other programming languages).

When I write
A < B < C < D

assuming we have a context-free parser and left-associative relative operations, this is
(((A < B) < C) < D)
and we wind up comparing the result of a comparison. But this is not what a mathematician means when they write that expression. Standard infix math notation wants it interpreted as
(A < B) && (B < C) && (C < D).

Engineering considerations say we want that with single subexpression evaluation and short-circuit semantics.

If generating code for a C program, the relop doesn't check whether the left subtree is another relop; all it knows is that the left subtree returned #false (or #true), so it compares its right subtree to that value. And treats booleans as a subrange of numbers [0..1], so the comparisons have humorous results that require "inside" knowledge about the binary representation of booleans to comprehend. We get single subexpression evaluation but no short-circuit semantics.

In languages like Java, comparing booleans and numbers is a static type error, because static type errors are probably more useful than humorous results that require inside knowledge about binary representation of booleans to correctly interpret. No subexpression evaluation, no short circuit semantics.

If it's a logic-descended language, the comparison may return NIL or a non-NIL value, and the non-NIL value it returns is the value of the last subexpression evaluated, meaning the value of the right subtree. This treats numbers as a subrange of boolean values (all of them are #true). And any relop whose left subtree is NIL returns NIL without evaluating its right subtree. An analogue of the mathematician's interpretation of chained relational operations falls out "naturally" and if you never do numeric operations on a value you got in a context that encouraged you to think of it as a boolean, you will never notice the difference. You also get single subexpression evaluation, you get short circuit semantics - all seems good!

But it breaks static type checking. This means NIL is its own type, and relational operators have to check for it at runtime, and it can get returned to continuations that expect numbers. So now everything that uses booleans or numbers has to do runtime type checking. From a static type checking POV treating numbers as a subrange of booleans is even more expensive than treating booleans as a subrange of numbers.

When I'm traversing the abstract syntax tree after parsing, I can have different code templates to emit code for relational operations. I pick one depending on whether a subtree or the parent node is or is not another relop. So I get static type checking, I get the mathematician's interpretation of chained relational operators, I get the efficiency of caching a single evaluation instead of expanding a macro and evaluating something a second time.... all is good.

But now I broke referential transparency. Identical subexpressions encountered in different contexts mean different things. All the type checking can be done statically, but the relational ops other than the root of the chain are making a choice between jumping to their own continuation site in their parent relational op (to return a number) or to the boolean continuation at the call site of the whole chain (to return #false). Only the root of the chain is special; it has no numeric continuation, and can jump instead to the exit continuation to return #true. This is statically typed return polymorphism.

I was considering whether to be upset about breaking referential transparency, but then I realized people are using languages with exceptions all the time without complaining about it, so referential transparency can take a walk, I guess, if it gets us static type checking.

Typesetting a Type System with Color-Coding

I've recently been working on formalizing the type system for one of my programming languages. While writing out the rules for inference I found it helpful to use both traditional type syntactic inference as well as some categorical language. The back-and-forth is so common that I have considered color-coding terms to indicate when they correspond to an object or morphism in which category.

My question for LtU is whether color-coding seems goofy or maybe helpful in some contexts. I have considered alternatives and I think there are three options:

1) Color-code terms by Category
2) Add text describing which terms correspond to which Category
3) Alter the term/type syntax to make category an explicit syntactic property

The ALTernative programming language

I've just finished documentation on my new programming language ALT.

It took me a lot of attempts (+- 20 implementations in scala and typescript) to come up with the current semantics. I would be grateful for some feedback.

PL Tea event today 26 July @ 14:00 New York time`

Info should be available at:

CFP: PLOS '23: 12th Workshop on Programming Languages and Operating System

Those of you who apply advanced PL ideas to operating systems might be interested in the upcoming PLOS 2023 workshop. See the abbreviated CFP below, and please consider submitting a short paper!

Thanks! ---



12th Workshop on Programming Languages and Operating Systems (PLOS 2023)

October 23, 2023

Koblenz, Germany
In conjunction with SOSP 2023

Paper submission deadline: August 4, 2023

Historically, operating system development and programming language development went hand-in-hand. Today, although the systems community at large retains an iron grip on C, many people continue to explore novel approaches to OS construction based on new programming language ideas. This workshop will bring together researchers and developers from the programming language and operating system domains to discuss recent work at the intersection of these fields. It will be a platform for discussing new visions, challenges, experiences, problems, and solutions arising from the application of advanced programming and software engineering concepts to operating systems construction, and vice versa.

Please visit the Web site for more info:

programming languages with full-unicode syntax and identifiers are surprisingly hard to do well.

I am working on a programming language. I want to fully integrate unicode - preferably the NFC/NFD repertoire, with "canonical" decompositions only.

At the same time I don't want the bidi algorithm to be used to display code in a deceptive order on the page. But, in order to achieve this, I have to require LTR control characters in the program text after every RTL character where the following character is bidi "neutral" or "weak." Is that a mode I can set in any programming editor in wide use, or do I have to implement my own editor? Adding LTR controls in a separate step (like with a sed script or something) means there's an extra step I have to do before I see while editing, the same version of the code the compiler will be seeing.

At the same time I don't want "lookalike" characters used to display deceptive identifiers. Nobody can tell by looking whether 'a' is Latin or Cyrillic, or whether 'A' is Latin or Greek, and I don't want programmers tearing their hair out trying to understand why the variable they thought they just initialized is holding some value set by malevolent code somewhere out of sight, or why a perfectly innocent "Jane_Doe" keeps getting blamed for the fraudulent transactions of someone else whose name appears to be spelled exactly the same. The most straightforward precaution here is to ban identifiers that contain alphabetic characters from more than one script, but it seems a lot like using a sledgehammer to kill cockroaches. A less restrictive rule would allow mixing scripts but not if you use any letters which are confusable between those scripts - for example you could mix Latin and Cyrillic if you do it without using any character that looks like "a" (or other characters that could be either) or you could mix Latin and Greek if you do it without using any character that looks like "A" (or "B", or "X", or other characters that could be either). But this makes the identifier syntax rules complicated to check and hard to easily express.

Just two of the *MANY* issues that need to be addressed in order to allow a fully unicode-enabled programming language that's not a security or usability disaster.

I used to hate Unicode a lot more than I still do. These days I recognize it as a massive hairball, but I'm not really angry about it any more; it's just one more piece of legacy design that clearly was NOT intended for the kind of use I want to make of it. So it's massively difficult to use, leaky, flabby, illogical, promotes insecurity worse than a language without array bounds checking, etc, but I guess I've just come to accept it and I'm finally getting around to trying to overcome the problems and try do something worthwhile with it anyway.

Egel v0.1.8 (beta) released - do syntax

Another Egel release. New in this patch: do syntax for applications, some utf bugs squashed.

Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere!

Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant.

The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, 0 |> ((+) 1) |> ((*) 2) will reduce to 2.

I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, def f = do ((+) 1) |> ((*) 2) is sugar for def f = [ X -> 2 * (X + 1) ]. I have found this very convenient while programming solutions to last year's Advent of Code.

Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they?

XML feed