LtU Forum

Viability of a static type system (like ML) for a relational language?

I'm building a relational language. I like the idea of a static type system but after toying for a while with the interpreter I have wonder how feasible is it.

Here, I think a "type" is the whole head of the relation (even scalars as "1" are relations) and most thing fell fine, but the join operator cause trouble: It could merge heads (or not) depending in the kind of join. So, I'm saying that I feel the relational mode generate on the fly new types and can't be specified before.

So, If i have a query like:

city = Rel[id:Int name:Str population:Int]

city where .id=1
city select [.id]
city select [toString(.id)]
city union city

I see the type of each relation change. Is impractical to type by hand each combination, but I like the idea of type most of it. But can the compiler be made to work with this similar to ML?

How work the sql languages, them are dynamic?

Recursive types

In some literature, including Wikipedia, there is a distinction I don't quite understand made between iso-recursive and eqi-recursive typing. In both cases, however, there is a concept of unrolling a recursive type by replacing the fixpoint variable with the recursive term.

I believe this idea is suboptimal and should be replaced by a better one I shall describe below. I am interested if this has been done before, and if my analysis (and the concrete algorithm I have written) is correct.

The problem with iso-recursion is that it induces an equivalence relation on type representations, and this a partition, which fails to place eqi-equivalent types in the same class. Here is an example:

([] -> [([] -> fix-2)])
([] -> [fix-2]

Here [..] denotes an n-ary tuple type, -> is a function type, and fix-2 is a fixpoint whose binder is implicitly 2 levels up in the structure. These encodings represent the same type, the second one is recursive but its unrolling is not equal to the first type.

I have found a solution to this problem: instead of a full unrolling, we can just unroll one level. This is much better! In particular my re-rolling algorithm subsumes the standard re-rolling by a full circle, and also produces a unique canonical representative of the class. It should work for monomorphic and first order polymophic types, at least if there is only a single fixpoint.

I will not give the algorithm here but I will give a description because it is fun! Consider a piece of string representing your type, with coloured segments for each level. The end of the string is knotted around the string some way up. The knot is called the fixpoint and the place it is knotted to is called the binding point.

The algorithm for re-rolling the string simply rolls the circle up the string one segment and compares the colours. If they're equal, roll up another segment. Continue until you either run out of string, or you get a mismatch, in which case backtrack by one. You can finish by cutting the matching tail off and doing a new knot.

Unrolling is the same. Just unroll ONE segment to shift the fixpoint binder down out of the way.

A standalone demo in Ocaml is here:

PLDI 2016 Proceedings now available on-line, free for 3 weeks

The proceedings for PLDI 2016 - taking place the week of June 13 in Santa Barbara - are now available on-line, free to all until 2 weeks after the conference. Download now! Lots of good stuff there.


Finally, my prayers are being heard. The question was "get us web enabled bytecode" and the answer came from above in a form of W3C group effort, WebAssembly.

Though it encapsulates low level jump commands into structures controlled by labels (I'd like to leave it at the low level for maximum freedom), it is more or less what I'd expected from a bytecode that could be ran inside browser. It's enough assembler like language to leave a lot of space for specific implementation of different programming language architectures, while retaining speed of native implementations. Maybe it is an overkill for a bytecode language to automatically deal with functions (see "call" command), but it can't hurt, we dont have to use it, we can maintain our own call stack if we want to replace existing one.

It is ought to have continuous memory handler that indexes space from 0 on. This memory can even grow on the fly, if we want to. This is just what I need, encapsulation of memory management.

If this is what I'm thinking it is (nearly 1:1 mapping from bytecode to machine code upon compilation), I can say only one thing: "thank you, world :)"

Stack of regions for managing effects?

Tofte and Talpin used stack of regions for managing memory. Can you recommend papers where per-frame regions/heaps are used for managing effects?

Metaprogramming and Free Availability of Sources

Metaprogramming and Free Availability of Sources by François-René Rideau

This is an older paper about metaprogramming and it ties into open source development process and why metaprogramming requires free/open source software.

The beginning of the article outlines a list of types of metaprograms such as compilers/translators, phase splitters, code walkers, meta compilers,explainers, etc. Later on it says that metaprogramming is required as problem complexity increases and there are more actors/stakeholders involved.

In the traditional development process, that excludes metaprogramming, the near totality of the code constituting software is hand-typed by humans; the details of execution, the overall behavior, everything must follow directly from human thought; every stage of the development, every modification is the work of man. Now, the catch is that sooner or later, some new functionalities to add to a software project will require global architectural changes of some depth: all the global modifications of the program will have to be performed by hand, with all the inconsistency problems unwillingly introduced by these changes, that induce numerous bugs and cost lots of iterations of development (for a publicly documented such phenomenon, see the occasional API changes in the Linux kernel; for a spectacular example, see the explosive bug of the first Ariane V rocket). The alternative to these modifications is a complete reimplementation, whose cost is that of a rewrite from scratch. All in all, the incremental cost of traditional development is proportional to the size of code that differs between two iterations of the process.

Let us now make metaprogramming part of the development cycle. Since metaprogramming allows arbitrary processing of code by the computers themselves, it enables the programmer who has to face a structural change in his program to perform semi-automatically any task of instrumentation or transformation of his program, so as to make it conform the new architecture, to check his new code with respect to declared invariants, to enforce consistency of various parts of his program.

In statically typed languages or at least in better IDEs, it's possible to perform various refactorings across a codebase which can constitute metaprogramming in a crude respect:

tiny doses of metaprogramming used in traditional software projects are a determining factor for the continuation of these projects: choice, often static, of a development environment; filtering of programs with static analyzers, checkers of more or less sophisticated invariants, or other kinds of lint; use of editors that manage the indentation and structure of programs, that they can colorize and fontify; occasional use of search-and-replace tools on the source code; in the case of more daring hackers, “macros” and “scripts” written in elisp, perl or other languages, to edit and manipulate sources.

The question I pose to LtU is what are the ways in which metaprogramming can be introduced in the development of proprietary programs in mainstream languages such as JavaScript and Java.


I'm designing a Turing complete utility microlanguage that has a minimal set of commands. Chosen commands should on the one side have ability to compile to fast assembler code, and on the other side sould be enough descriptive so one can directly write sane code with them.

The microlanguage would be a part of my bigger project metalanguage that compiles any other languages to the microlanguage which at the end (jit or cached) compiles either to browser-executable (html + javascript), either to native code. Because it wouldn't really be used for high level programming, but only as underneath intermediate language (analogous to some bytecode language), it shouldn't have all heavy weight addons like objects, structs or reactive constructs, but these addons should be able to compile from a guest language to the host microlanguage. It should be also easy to port the microlanguage to platforms like OSX, Windows, Linux or Android.

So I'm thinking about a minimal set of commands. Somehow I think it should be imperative language, because microprocessors work that way at low-level understanding. I was thinking of supporting only these:

  • variable assignment and arithmetic
  • pointers and arrays like in C
  • if branching
  • for and while loops with labels
  • defining functions with stack support
  • assembler passing engine like in C for system programming

Do these commands sound cool or there is something that can be changed for better speed/usability performance?

A2: Analog Malicious Hardware

Nothing much to do with language theory but too cool not to share.

While the move to smaller transistors has been a boon for performance it has dramatically increased the cost to fabricate chips using those smaller transistors. This forces the vast majority of chip design companies to trust a third party —often overseas—to fabricate their design. To guard against shipping chips with errors (intentional or otherwise) chip design
companies rely on post-fabrication testing. Unfortunately, this type of testing leaves the door open to malicious modifications since attackers can craft attack triggers requiring a sequence of unlikely events, which will never be encountered by even the most diligent tester.

In this paper, we show how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e.,
requires an unlikely trigger sequence before effecting a chip’s functionality). In the open spaces of an already placed and routed design, we construct a circuit that uses capacitors to siphon charge from nearby wires as they transition between digital values. When the capacitors fully charge, they deploy an attack that forces a victim flip-flop to a desired value. We weaponize this attack into a remotely-controllable privilege escalation by attaching the capacitor to a wire controllable and by selecting a victim flip-flop that holds the privilege bit for our processor. We implement this attack in an OR1200 processor and fabricate a chip. Experimental results show that our attacks work, show that our attacks elude activation by a diverse set of benchmarks, and suggest that our attacks evade known defenses

A2: Analog Malicious Hardware. K. Yang, M. Hicks, Q. Dong, T. Austin, D. Sylvester, Department of Electrical Engineering and Computer Science, University of Michigan, USA.

Comment by Yonatan Zunger, Head of Infrastructure for the Google Assistant:

"This is the most demonically clever computer security attack I've seen in years. It's a fabrication-time attack: that is, it's an attack which can be performed by someone who has access to the microchip fabrication facility, and it lets them insert a nearly undetectable backdoor into the chips themselves. (If you're wondering who might want to do such a thing, think "state-level actors")

The attack starts with a chip design which has already been routed -- i.e., it's gone from a high-level design in terms of registers and data, to a low-level design in terms of gates and transistors, all the way to a physical layout of how the wires and silicon will be laid out. But instead of adding a chunk of new circuitry (which would take up space), or modifying existing circuitry significantly (which could be detected), it adds nothing more than a single logic gate in a piece of empty space.

When a wire next to this booby-trap gate flips from off to on, the electromagnetic fields it emits add a little bit of charge to a capacitor inside the gate. If it just happens once, that charge bleeds off, and nothing happens. But if that wire is flipped on and off rapidly, it accumulates in the capacitor until it passes a threshold -- at which point it triggers that gate, which flips a target flip-flop (switch) inside the chip from off to on.

If you pick a wire which normally doesn't flip on and off rapidly, and you target a vulnerable switch -- say, the switch between user and supervisor mode -- then you have a modification to the chip which is too tiny to notice, which is invisible to all known forms of detection, and if you know the correct magic incantation (in software) to flip that wire rapidly, will suddenly give you supervisor-mode access to the chip. (Supervisor mode is the mode the heart of the operating system runs in; in this mode, you have access to all the computer's memory, rather than just to your own application's)

The authors of this paper came up with the idea and built an actual microchip with such a backdoor in it, using the open-source OR1200 chip as their target. I don't know if I want to guess how many three-letter agencies have already had the same idea, or what fraction of chips in the wild already have such a backdoor in them."

Programming with a Differentiable Forth Interpreter

On arxiv. Abstract:

There are families of neural networks that can learn to compute any function, provided sufficient training data. However, given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. Here we consider the case of prior procedural knowledge such as knowing the overall recursive structure of a sequence transduction program or the fact that a program will likely use arithmetic operations on real numbers to solve a task. To this end we present a differentiable interpreter for the programming language Forth. Through a neural implementation of the dual stack machine that underlies Forth, programmers can write program sketches with slots that can be filled with learnable behaviour. As the program interpreter is end-to-end differentiable, we can optimize this behaviour directly through gradient descent techniques on user specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex transduction tasks such as sequence sorting or addition with substantially less data and better generalisation over problem sizes. In addition, we introduce neural program optimisations based on symbolic computation and parallel branching that lead to significant speed improvements.

Theory of syntax extensions: does it exist?

I've become interested in the question of syntax extensions: not the general theory of extending syntax, nor the mechanisms for extending syntax (there are many, from the C preprocessor to Racket's extended syntax-case), but some basis for deciding what syntaxes should be provided and how they ought to be organized. Most languages have fixed syntax, either because their creators took it for granted or as a matter of principle. Those that do allow syntax extension usually have cultural constraints against overusing it, and the extended syntaxes provided by the standard library are a rag-bag whose members depend on history or the creator's whims or both. We seem to be at the same state where syntax libraries are concerned that we were in in the 1960s for procedure libraries.

Is there state of the art about this that I haven't heard about yet? Please inform and/or enlighten me.

XML feed