LtU Forum

how to design PL support for effects emerging from parallel non-determinism?

Let's make an on-topic place to discuss programming language relevant issues in concurrency and non-determinism. Software development in general isn't really on topic. It really needs to be about an aspect of PL design or implementation, or else LtU isn't suitable.

It seems folks designing a PL would like a language to be suitable for performance work, in a world where scaling better may imply doing more things in parallel, to get more hardware usage per unit of time in lieu of processors actually running faster. The plan is something like: well if cycles have stalled we can just use more cores. In general this seems to lead to unpredictability, which perhaps might be addressed by PL tech, somehow or other.

Something is missing in how developers specify what they think code will do, in the large. It's a common belief it will be obvious. ("It should be obvious what the code does from reading it, so comments are superfluous.") This is slightly like thinking other people will recognize a tune you have in mind when you drum out its rhythm with your fingers, when in fact they will only hear you drumming your fingers. A pattern you expect to be obvious won't be.

Maybe more than one layer of grammar would help. (Just an idea.) In addition to local language syntax, you could specify larger patterns of what will happen, in terms of expected patterns in future inter-agent transactions, whether they be objects or something else. There seems lack of current art in capturing developer intent about shape of event sequences. Sometimes I suspect a few people intend to fix this with determinism in language behavior, which doesn't seem a good fit for fallible distributed internet components.

I have nothing to post unless a remark catches my eye. If I have something interesting to say, I will. (I don't blog any more, and I'm all but unreachable except here, unless you call my phone or email my work address. My online presence is near zero. I don't even check LinkedIn.)

What specifically would you do to a programming language to address the emergent elephant that results from code specifying tail, legs, trunk, etc?

PHOG: Probabilistic Model for Code

A paper presented as a poster at ICML. Abstract:

We introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code.

Also comes with reviews.

Transpiling a dynamically typed language to a statically typed language

Hi, I have been pondering what I want to do for my master
thesis (2+ years away) and I have an idea. I am writing to ask if
the question has a known answer or is even theoretically possible
to conduct.

My idea is to write a source-to-source compiler from a Lisp_2 to
Rust, but I do not know if it is possible to transpile from a
dynamic language to a statically typed language.

Below is a rough draft for the introductory chapter. Beware that
my writing style implies that I already know that producing such
a transpiler is possible, but I do not.

Kind regards,
Filip Allberg

A Tale of Two Languages

Imbuing Lisp with the thread-safety guarantees of Rust by using a source-to-source compiler.


This short introductory chapter starts by presenting the question we are trying to answer, followed by a summary of our suggested solution that provides a definitive answer to the our question. We end this chapter ends with an overview of the subsequent chapters.

The question

By compiling a language X into a target language Y through the use of a source-to-source compiler (also known as a transpiler) the executable program (created by an additional compilation step) inherits certain qualities from the language Y, particularily its execution model.

Transpiling X to a target language Y may put restraints on the definition of X and possibly have a negative impact on the runtime environment of the final executable certain desireable traits may be inherited from Y, for an example cross-platform portability.

A commonly selected target language for a transpiler is the programming language "C". Since C is available on practically any machine, the C-code that is generated from a program written in X is likely to be portable as well.

Moreover, any optimizations that the C-compiler can achieve are automatically and implicitly available to the source language X. C-compilers excel at carrying out a great many optimizations with respect to allocating registers, laying out code, and choosing modes of address. As designers of the source language we are liberated of such issues.

In the case of C there are numerous pain-points such as pointers running amok and its lack of built-in garbage collection. If the source-language is a garbage-collected language, that is type-safe, such problems have to be addressed in the source-to-source compiler.

Furthermore there are certain limits imposed on our source language by the limitations of C has, such as a maximum of 32 arguments in function calls or less than 16 levels of lexical blocks.

Inherently, each source language presents its own set of advantages and negative consequences with respect to the source language.

The Rust programming language, which is a systems programming language that is a contender in the same domain as C and C++, has the desireable attribute of guaranteeing thread-safety and threads without dataraces.

The question then is, can a specific source language compiled into Rust be unequivocally determined to be thread-safe through the merits of the source-to-source compiler alone?

The solution and answer

In this text we present a source-to-source compiler from a Lisp2 dialect to Rust whereby we can, through the use of the Rust compiler rustc, compile the generated code. Using the generated programs, we can conclude, with the addition of formal theoretical deductions, that valid programs written in the aforementioned Lisp dialect are thread-safe and provides threads without data-races.


Since there is no additional body of work this subsection cannot be fully written, but a rough layout of the remainder of the text would probably look like:

Chapter 1. An overview of Lisp and a formal description of our dialect as well as an overview of our concurrency model (not sure if the latter is necessary)

Chapter 2. A precursory introduction to Rust with references to papers demonstrating the validity of the claims that Rust provides thread safety guarantees and data race guarantees as this is a corner stone to our work.

Chapter 3. Examples from concrete programs written in the Lisp dialect that will serve as benchmarks when evaluating the code generated by our compiler.

Chapter 4. An overview of the implementation of the compiler that also discuss alternative implementation choices and why they were not chosen

Chapter 5-(maybe several chapters): Explanations describing the parts of the different compilation phases. It is probable that we will encounter some automatas here.

Chapter 6: Code generation optimizations and translation strategies to portable Rust code.

Chapter 7: Evaluation of benchmarks

Chapter 8: Discussions of possible applications, such as tiering source-to-source compilers in a layered fashion to successively imbue desireable traits from each language in the compilation sequence to the top-most source language.

Appendices: Complete source code for the compiler

Viability of a static type system (like ML) for a relational language?

I'm building a relational language. I like the idea of a static type system but after toying for a while with the interpreter I have wonder how feasible is it.

Here, I think a "type" is the whole head of the relation (even scalars as "1" are relations) and most thing fell fine, but the join operator cause trouble: It could merge heads (or not) depending in the kind of join. So, I'm saying that I feel the relational mode generate on the fly new types and can't be specified before.

So, If i have a query like:

city = Rel[id:Int name:Str population:Int]

city where .id=1
city select [.id]
city select [toString(.id)]
city union city

I see the type of each relation change. Is impractical to type by hand each combination, but I like the idea of type most of it. But can the compiler be made to work with this similar to ML?

How work the sql languages, them are dynamic?

Recursive types

In some literature, including Wikipedia, there is a distinction I don't quite understand made between iso-recursive and eqi-recursive typing. In both cases, however, there is a concept of unrolling a recursive type by replacing the fixpoint variable with the recursive term.

I believe this idea is suboptimal and should be replaced by a better one I shall describe below. I am interested if this has been done before, and if my analysis (and the concrete algorithm I have written) is correct.

The problem with iso-recursion is that it induces an equivalence relation on type representations, and this a partition, which fails to place eqi-equivalent types in the same class. Here is an example:

([] -> [([] -> fix-2)])
([] -> [fix-2]

Here [..] denotes an n-ary tuple type, -> is a function type, and fix-2 is a fixpoint whose binder is implicitly 2 levels up in the structure. These encodings represent the same type, the second one is recursive but its unrolling is not equal to the first type.

I have found a solution to this problem: instead of a full unrolling, we can just unroll one level. This is much better! In particular my re-rolling algorithm subsumes the standard re-rolling by a full circle, and also produces a unique canonical representative of the class. It should work for monomorphic and first order polymophic types, at least if there is only a single fixpoint.

I will not give the algorithm here but I will give a description because it is fun! Consider a piece of string representing your type, with coloured segments for each level. The end of the string is knotted around the string some way up. The knot is called the fixpoint and the place it is knotted to is called the binding point.

The algorithm for re-rolling the string simply rolls the circle up the string one segment and compares the colours. If they're equal, roll up another segment. Continue until you either run out of string, or you get a mismatch, in which case backtrack by one. You can finish by cutting the matching tail off and doing a new knot.

Unrolling is the same. Just unroll ONE segment to shift the fixpoint binder down out of the way.

A standalone demo in Ocaml is here:

PLDI 2016 Proceedings now available on-line, free for 3 weeks

The proceedings for PLDI 2016 - taking place the week of June 13 in Santa Barbara - are now available on-line, free to all until 2 weeks after the conference. Download now! Lots of good stuff there.


Finally, my prayers are being heard. The question was "get us web enabled bytecode" and the answer came from above in a form of W3C group effort, WebAssembly.

Though it encapsulates low level jump commands into structures controlled by labels (I'd like to leave it at the low level for maximum freedom), it is more or less what I'd expected from a bytecode that could be ran inside browser. It's enough assembler like language to leave a lot of space for specific implementation of different programming language architectures, while retaining speed of native implementations. Maybe it is an overkill for a bytecode language to automatically deal with functions (see "call" command), but it can't hurt, we dont have to use it, we can maintain our own call stack if we want to replace existing one.

It is ought to have continuous memory handler that indexes space from 0 on. This memory can even grow on the fly, if we want to. This is just what I need, encapsulation of memory management.

If this is what I'm thinking it is (nearly 1:1 mapping from bytecode to machine code upon compilation), I can say only one thing: "thank you, world :)"

Stack of regions for managing effects?

Tofte and Talpin used stack of regions for managing memory. Can you recommend papers where per-frame regions/heaps are used for managing effects?

Metaprogramming and Free Availability of Sources

Metaprogramming and Free Availability of Sources by François-René Rideau

This is an older paper about metaprogramming and it ties into open source development process and why metaprogramming requires free/open source software.

The beginning of the article outlines a list of types of metaprograms such as compilers/translators, phase splitters, code walkers, meta compilers,explainers, etc. Later on it says that metaprogramming is required as problem complexity increases and there are more actors/stakeholders involved.

In the traditional development process, that excludes metaprogramming, the near totality of the code constituting software is hand-typed by humans; the details of execution, the overall behavior, everything must follow directly from human thought; every stage of the development, every modification is the work of man. Now, the catch is that sooner or later, some new functionalities to add to a software project will require global architectural changes of some depth: all the global modifications of the program will have to be performed by hand, with all the inconsistency problems unwillingly introduced by these changes, that induce numerous bugs and cost lots of iterations of development (for a publicly documented such phenomenon, see the occasional API changes in the Linux kernel; for a spectacular example, see the explosive bug of the first Ariane V rocket). The alternative to these modifications is a complete reimplementation, whose cost is that of a rewrite from scratch. All in all, the incremental cost of traditional development is proportional to the size of code that differs between two iterations of the process.

Let us now make metaprogramming part of the development cycle. Since metaprogramming allows arbitrary processing of code by the computers themselves, it enables the programmer who has to face a structural change in his program to perform semi-automatically any task of instrumentation or transformation of his program, so as to make it conform the new architecture, to check his new code with respect to declared invariants, to enforce consistency of various parts of his program.

In statically typed languages or at least in better IDEs, it's possible to perform various refactorings across a codebase which can constitute metaprogramming in a crude respect:

tiny doses of metaprogramming used in traditional software projects are a determining factor for the continuation of these projects: choice, often static, of a development environment; filtering of programs with static analyzers, checkers of more or less sophisticated invariants, or other kinds of lint; use of editors that manage the indentation and structure of programs, that they can colorize and fontify; occasional use of search-and-replace tools on the source code; in the case of more daring hackers, “macros” and “scripts” written in elisp, perl or other languages, to edit and manipulate sources.

The question I pose to LtU is what are the ways in which metaprogramming can be introduced in the development of proprietary programs in mainstream languages such as JavaScript and Java.


I'm designing a Turing complete utility microlanguage that has a minimal set of commands. Chosen commands should on the one side have ability to compile to fast assembler code, and on the other side sould be enough descriptive so one can directly write sane code with them.

The microlanguage would be a part of my bigger project metalanguage that compiles any other languages to the microlanguage which at the end (jit or cached) compiles either to browser-executable (html + javascript), either to native code. Because it wouldn't really be used for high level programming, but only as underneath intermediate language (analogous to some bytecode language), it shouldn't have all heavy weight addons like objects, structs or reactive constructs, but these addons should be able to compile from a guest language to the host microlanguage. It should be also easy to port the microlanguage to platforms like OSX, Windows, Linux or Android.

So I'm thinking about a minimal set of commands. Somehow I think it should be imperative language, because microprocessors work that way at low-level understanding. I was thinking of supporting only these:

  • variable assignment and arithmetic
  • pointers and arrays like in C
  • if branching
  • for and while loops with labels
  • defining functions with stack support
  • assembler passing engine like in C for system programming

Do these commands sound cool or there is something that can be changed for better speed/usability performance?

XML feed