LtU Forum

My article on state machines and DSL evolution

I've been unhappy with with state machines, activity diagrams, and BPMN as tools for modelling processes for long time. In the article linked below I've used the same principles as were used for moving from flat languages (Asm, Fortran 66, BASIC, etc) to structured programming to create a behavior model that is some kind of behavior equivalent of Martin Fowler state machine model.

(Teaser) the end result for the sample is the following:

    ESCAPE doorOpened {
        DO lockPanel, unlockDoor
        WAIT doorClosed
        ALL {
            WAIT lightOn
        } AND {
            WAIT drawOpened
        DO lockDoor, unlockPanel
        WAIT panelClosed

Do we need exactly two binding constructs?

I'm recently thinking about the relation between objects and lexical environment. Objects, especially under a prototype-based object system, looks suspiciously similar to a lexical environment:

slot-ref <-> environment-ref
slot-set! <-> environment-set!
add-slot! <-> environment-define
parent(s)-of/delegate(s)-of <-> environment-parent(s)

However, one problem remains in the way of unifying those 2 binding constructs completely: the slot-scoping problem. If I simply use symbols (like the case for environment) to designate slots, there's nothing to prevent two authors to come up with the same name (say 'x) and they can clash, especially in the presence of multiple delegation/inheritance. Therefore I figure I must use slot objects rather than symbols to designate slots:

(within-environment user-1
   (define x (make-slot)))
(within-environment user-2
   (define x (make-slot))
   (make-object x 1 user-1:x 2))

and... now environments bind symbols to slot objects, and objects bind slot objects to values.

This all looks fine, except that it makes me itch that I need to have two almost identical constructs, and I can't unify them into one! Are those two binding constructs exactly what we need, no more, no less?

Cicada language -- a new dependently typed language

Cicada language is a
dependently typed programming language and a
interactive theorem prover.

The aim of cicada project is to help people understand that developing software and developing mathematics are increasingly the same kind of activity, and people who practices these developments, can learn from each other, and help each other in very good ways.

Homepage at:

CUE: An open-source data validation language

There are two core aspects of CUE that make it different from the usual programming or configuration languages:

- Types are values
- Values (and thus types) are ordered into a lattice

These properties are relevant almost to everything that makes CUE what it is. They simplify the language, as many concepts that are distinct in other languages fold together. The resulting order independence simplifies reasoning about values for both humans and machines.

It also forces formal rigor on the language, such as defining exactly what it means to be optional, a default value, or null. Making sure all values fit in a value lattice leaves no wiggle room.

The application of set-theoretic types is getting more and more mainstream as they turn out to be very effective in (partially) typing dynamic languages (i.e. Typescript). Other popular examples are Scala(3) and Kotlin, together with other (less mainstream) examples such as Ceylon, Typed Racket and the Avail programming language.

I've dabbled in Avail a few years ago, and was very impressed by Avail's type driven compiler, but not so much by its (extreme) DSL support. In contrast to Avail, I believe CUE is much more pragmatic while not allowing any 'tactical' compromises.

CUE has its origins at Google: the ability to compose CUE code/specifications is really useful at the scale that Google operates. The creator of CUE - Marcel van Lohuizen - has recently quit his job at Google to become a full-time developer of CUE. I think Marcel has created something really interesting, so I understand his decision!

Shen Standard Library

The documentation can be viewed on Download on


Trojan Source: Unicode Bidi Algorithm abuses in source code

A recent chunk of research caught my eye.

The skinny is that the unicode bidi algorithm can be used to do all kinds of mischief against the correspondence between the visual display of code and the semantic effect of it. To the extent that the manipulation of code semantics is almost but not quite arbitrary. It's exceedingly easy to use this technique to hide trojan-horse type vulnerabiity in almost any language that accepts unicode source.

My own analysis of this is that this comes about because the 'tree' or 'stack' of bidi contexts is not congruent to the 'tree' or 'stack' of semantic source code contexts.

In short no bidi context can extend beyond the bounds of the most specific syntactic context in which it was created. And that includes 'trivial' subtrees like comments and string constants. As the compiler exits a clause, it must report an error if the bidi state that existed when it entered that clause has not been restored.

And even within the subclauses, keywords and structure must begin and end with the same bidi context. For example if the language has an 'if' subclause that's

'if ('+predicate +') then' + consequent + 'else' + alternate 

Then the predicate, consequent, and alternate clauses can push/pop bidi states within themselves, but keywords and parens must all begin and end in the exact same bidi state. And that bidi state - the one that existed before the parser started reading the subclause - must be restored when the parser leaves the subclause.

Right now most unicode-accepting programming systems are treating bidi state as irrelevant and bidi override control characters as whitespace. This means that code that looks exactly the same in an editor can be read in different sequence and have vastly different semantics.

Forcing bidi level to remain congruent to syntax level through the program means that a program that displays in a particular way either has a single valid semantics according to its sequence order, or the parts you see are in some non-obvious sequence order and it is therefore a syntax error.

The Book of Shen 4th edition is now available online.

The Book of Shen 4th edition (TBoS) is now available online. This is part of an open science
initiative which will place over 1000 pages of computer science online. I'd like to thank
our sponsors whose support has made this possible.

TBoS is based around the S series of kernels, of which the latest S31 went open source.

Relevant Links

Home page
S Series Kernel
Open Science
TBoS Online
TBoS Amazon


Egel 0.1 (beta)

Egel is an untyped eager combinator scripting language. I recently decided to go beta, the thing is now ready for shipping; the language seems stable but libraries are either undeveloped or subject to change. I am now actively looking for users and feedback.

It should compile on most unixes and macos.

Cheers, Marco

We've got big balls ... of mud ...

The most popular and enduring architecture for all large projects is the one called "Big Ball Of Mud."

It's what happens when requirements grow in unexpected directions that cut directly across what were originally conceived as architectural boundaries, and met with adhoc code that connects parts of the program that the original architecture would never have allowed to touch each other. Then the whole thing is maintained for ten years, in the face of steadily expanding and frequently changing use cases and problems and by a host of people many of whom don't really know what's going on in the rest of the program, then gets bought by a competitor who starts reimplementing the whole thing in a different language and with a "proper architecture" but runs out of budget for that so they only manage to reimplement about a third of it, then link the stuff in the new language together with the stuff in the original language (and libraries in a third, not that anyone's looking at libraries at this point) and hack together the logical mismatches between "old" and "new" code by running perl scripts on the database. .....

And it just goes on like that - year after year after year.

Now imagine it's 2037 or whatever and you have a system where this process has taken the architecture to its logical conclusion. It's not just a big ball of mud any more, it's a planet of mud. There is no solid ground anywhere in any direction. Nothing like any kind of rational architecture (except Big Ball Of Mud architecture) has emerged.

What can a programmer who is responsible for some small part of this giant mudball - say, one file of code - do? In the 'thinking globally and acting locally' sense, because obviously one programmer on a whole planet of mud is in no position to impose new architecture on the whole and doesn't even have the information needed to decide a new architecture for the whole.

What modest steps can one programmer take to smooth out, and perhaps dry a little, their particular little bit of mud? Ideally, they should be low-effort steps that don't require all that much reaching into all the other bits of mud. And ideally, if all the individual programmers all over the mud planet did them in the course of maintaining their little bits of mud, it should eventually result in the simplification of the whole insofar as possible, and the emergence of some kind of rational architecture.

It might be an architecture that no one of the programmers would have designed. It might be an architectural paradigm that humans haven't even discovered yet. It might not be particularly elegant. But after a lot of iterations by people working all over the program, what small, feasible, local actions in a big ball of mud produce a program with more tractable organization?

Are there any metrics that a monitoring system can auto-generate that could give the programmer a "score" for whether they're improving or degrading the process of creating some kind of code organization?

Bespoke: live graphical language for granular synthesis

This post is a slight provocation in that the inventor of Bespoke ("A modular DAW...") does not describe or promote his work as a programming language.

In Bespoke, programmer/composer/performers construct live dataflow graphs. Edges are each one of just a few types: MIDI signals (an interchange standard for real-time control of electronic instruments), digitized audio signals, and control signals.

In contrast to edge types, nodes in the graph vary without bound, each transforming inputs to outputs in specialized ways. Nodes have internal state and can expand the amount of state.

Composites of nodes can be regarded as themselves nodes (conceptually and practically).

Because programmers are meant to be able develop and modify programs while they are running, the types of data flowing on edges are necessarily manifest types. Type checking (trivially matching the types of data sources and syncs) happens on the fly as code is composed. One can't draw a "wrongly typed" program in this system -- midi outputs go to midi inputs and so forth.

The separation of types such as audible, control, and MIDI signals arises solely due to the kinds of physical equipment used to receive or output such signals. E.g., if you try to play midi wire signals over a speaker, there's little if anything to hear. One can easily imagine a slight generalization of bespoke in which, at core, all packets are untyped.

Packet transmission is necessarily noisy in the sense that the language provides no guarantees about packet delivery. For example, an audio signal might arrive at a different sampling rate and width than it was transmitted with. Packets may be dropped or spurious ones inserted. As a practical matter, stronger guarantees about packet transmission can be promised by implementations.

In the bespoke we see, there is a global clock and packets are timestamped though they may be transmitted and received before their timestemp time. This aspect of packets is specific to the music-making domain though it presumably has wider applicability. One can easily imagine a bespoke core that lacks that kind of synchronization other than that transmissions temporally precede receptions.

XML feed