Turnstile+: Dependent Type Systems as Macros

In 2017, a team from Northeastern University released Turnstile, a framework for implementing propositionally typed languages in Racket; cf. naasking's story Type Systems as Macros. The system was really nice because it allowed type systems to be expressed in a manner similar to the way theoretical PL researchers would in a paper, and because it hooked into Racket's clean compiler backend.

Now Stephen Chang, one of that team, together with new coauthors Michael Ballantyne, Usamilo Turner and William Bowman, have released a rewrite that they call Turnstile+, together with a POPL article, Dependent Type Systems as Macros. From that article's introduction:

Turnstile+ represents a major research leap over its predecessor. Specifically, we solve the major challenges necessary to implement dependent types and their accompanying DSLs and extensions (which Turnstile could not support), while retaining the original abilities of Turnstile. For example, one considerable obstacle was the separation between the macro expansion phase and a program’s runtime phase. Since dependently typed languages may evaluate expressions while type checking, checking dependent types with macros requires new macrology design patterns and abstractions for interleaving expansion, type checking, and evaluation. The following summarizes our key innovations.

  • Turnstile+ demands a radically different API for implementing a language’s types. It must be straightforward yet expressive enough to represent a range of constructs from base types, to binding forms like Π-types, to datatype definition forms for indexed inductive type families.
  • Turnstile+ includes an API for defining type-level computation, which we dub normalization by macro expansion. A programmer writes a reduction rule using syntax resembling familiar on-paper notation, and Turnstile+ generates a macro definition that performs the reduction during macro expansion. This allows easily implementing modular type-level evaluation.
  • Turnstile+’s new type API adds a generic type operation interface, enabling modular implementation of features such as error messages, pattern matching, and resugaring. This is particularly important for implementing tools like tactic systems that inspect intermediate type-checking steps and construct partial terms.
  • Turnstile+’s core type checking infrastructure requires an overhaul, specifically with first-class type environments, in order to accommodate features like dependent binding structures of the shape[x:τ]...,i.e., telescopes [de Bruijn 1991; McBride 2000].
  • Relatedly, Turnstile+’s inference-rule syntax is extended so that operations over telescopes, or premises with references to telescopes, operate as folds instead of as maps

The code is available at https://github.com/stchang/macrotypes.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Macros again

It does bother me to see folks continuing to further invest in macros; I see practical reasons why they do it (just as I see practical reasons why physicists keep investing in string theory), but I don't see that understanding makes it any more laudable. For fexprs to catch up, though, they need to be competing; and for my part, I'm type-skeptical. I don't recall (perhaps it's slipped my mind?) attempting to explain on LtU my beef with types.

(1) These sorts of type systems have to support all correct formal reasoning that can be done. Gödel's theorems aren't friendly to that. I've wondered for a couple of decades or so, if the root of this is some sort of mis-step in setting up the problem. Viciously circular logic is an existential crisis, whereas the equivalent computation is just a program that's hung up in an infinite loop (it happens); is there a way, by embedding logical reasoning within the computational system (rather than vice versa), to bleed off the antinomies into mere non-halting computation? It's apparently impossible to prove (formally) this can't be done, because it involves stepping outside the rules; yet there's also no knowing that it can be done except to figure out how.

(2) Typing and computation put conflicting demands on programming-language design. Typing is formal proof about static expressions. Computation is what those static expressions actually do when evaluated. When you design a language to support typing, you're trying to make clear how the formal proof works. When you design a language to support computation, though, you're trying —as your first line of defense against getting the program wrong, before you even begin to reason formally about it— to make clear what the program does. If you can't tell, by looking at the program, what it does, that's a non-starter. And I suggest you cannot successfully design for both at once. Designing to clarify the formal reasoning obfuscates the algorithm; designing to clarify the algorithm obfuscates the formal reasoning.

on the goal of clear computation

Programmers will certainly write code I cannot understand. For example, see the procmail source code. Seems to be perfect, right?

Programmers, like human writers in other fields, are also bad and biased judges of whether their own code is 'clear'. They know what they meant, which interferes with reading what was actually written. Perhaps a few programmers can become the sort of experts that write clear code habitually. But, for most, an independent review is required to effectively judge clarity.

Even when each unit of code is written clearly, it can become a challenge to comprehend after we layer dependency upon dependency until we have a dense forest of code. That is, it doesn't matter whether each branch and leaf is clear if there are too many branches, too many leaves. Further, there are software packages and team development. Even if you read and understand what a program does at a specific point in time, it might change when you aren't looking, and perhaps you won't find the time or motivation to again 'look at the program' to understand the change and 'tell what it does'. If by accident you depend on some implementation detail that changes, behavior may break without obvious, traceable, or easily isolable reason.

I'm curious how you would propose to "design for" clarity, whether at small scales or larger ones. I believe we could build a culture where clarity of code is prioritized. But I doubt any culture would be stronger than the combined (1) ego of programmers who insist they know what they're doing, (2) first-to-market incentives that drive 'worse is better' and won't accept delays to improve clarity.

Types are certainly not perfect, but they usefully document and check our assumptions in the imperfect programming environment where humans aren't always clear, where subprograms change when we aren't looking. Of course, we could take the 'checked documentation' concept in a different direction from types (types are characterized by compositionality, respecting module boundaries), e.g. something closer to Hoare logic, satisfiability modulo theories, model testing that might get into the white-box or grey-box analyses.

Sapience, clarity

I reread that comment several times in seeking to articulate what was bothering me about it. The trouble seems to be not so much with the concrete assertions as with the, well, attitude; partly tone and emphasis, but then also in the nature of the conclusions explicitly or implicitly drawn. Human beings are assumed to get everything wrong that they're allowed to, yet non-sapient devices (here, typing) are treated with patience and forgiveness of their flaws. It seems one starts by assuming technology is superior to sapience, and comes to the conclusion that technology is superior to sapience. I have become increasingly aware, in the past few years especially, that our global culture has been permeated by a confusion of thinking in which technology is held up as an exemplar of what sapient minds would be if they weren't so flawed by being able to think. (Just saw this today: smbc) It's a sort of self-hatred, really. One can see how economics drives it: technology that avoids having to employ people provides an economic advantage, therefore an incentive to find and exploit strengths of technology over people and disincentive to look for strengths of people over technology, and then in selling the new technology an incentive to present it as inherently superior while avoiding mention of any advantage of a human alternative to what you're selling. Okay, enough of that rant.

Design for clarity is not something strange; at least, it shouldn't be. Arrange things to make it possible to express things clearly, and arrange things to make it easier to express things more clearly, more difficult to express them less clearly. Making it possible to express things clearly generally entails maximizing flexibility, avoiding rigidity; sapient thought, I've observed, is too volatile to fit neatly into rigid formal structure. Making it easier to express things more clearly than less is... tricky. Subtle. In the case of fexprs versus macros, the various opportunities for elegant expression, opportunities for inelegant expression, stability, and error-prone-ness get tremendously complicated. I'm still boggling, six years later, that when we were discussing the "goto fail bug" on LtU (yonder), observing a skew of suggestions toward technological checks, I made what I thought should be a very obvious observation that the problem would have been less likely to happen if the language required an if clause to have an explicit terminator — and got some pushback on the idea. (Not even pushback on my claim that the syntactic change would make the error less likely; yes we did subsequently have an interesting discussion, but first I was told it was socially irresponsible of me to suggest we should prefer syntax that isn't inherently error-prone. Holy crap.)

You mention Hoare logic, etc. As with types, these are attempts to impose technological standards on sapient thought. (I blogged on this a while back, suggesting the take-away lesson from Gödel's Theorem(s) should be that non-sapient ("formal") reasoning is less powerful than sapient thinking. Not formally less powerful than sapient thinking; if sapient thought is more powerful than formal reasoning, one of our first expectations should be that this would not be formally provable. As the saying goes, in theory there is no difference between theory and practice, but in practice there is. (blog post))

Technology vs. Sapience?

Type systems are just another tool. It'd be silly to judge a plow for making a poor hammer. And it's just as silly to judge a human for making a poor tool.

Arguably, asking people to reliably program clearly or to read and comprehend large codebases qualifies as "attempts to impose technological standards on sapient thought". Humans aren't reliable at anything other than being human.

One of the common ways that humans try to clarify, verify, and explain their intentions is to say the same thing in a few different ways and to describe external assumptions. I tend to think about types, Hoare logic, natural language documentation (comments), etc. as just another example of this behavior. But natural language documentation is currently only useful for other humans, while types and the rest can serve as documentation and clarification of intent to both humans and machines.

Perhaps in some distant future, a sapient machine could read natural language comments and associated documentation and tell humans, "nope, this function definitely isn't doing what was said in this sentence here". Wouldn't that be the ultimate type checker? Of course, in that future, the machine would probably write most 'low level' code. But we'd still want to say the same thing more than one way, to clarify and verify our intentions.

You might have missed his point

Nothing you said contradicted what John said. You might be talking right past him.

What John said is that, in the manner that we choose to express things like type systems in a human-written and human-readable form, we can make decisions that make it easier to express things correctly, and inversely, harder to express things that should not exist.

Those results are difficult to achieve, because it is somewhat unnatural to bend a design away from a pure abstraction and towards an "opinionated" grammar, without somehow sacrificing some desirable attributes of the grammar.

(Now let's see if I, too, missed John's point.)


I agree with the idea that languages can be designed to curtail common errors. I was part of that linked discussion. Though, whether this improves 'clarity' is a separate question.

I disagree with the entire framing of 'flexible thinking' as a viable defense, an alternative to reliable automated reasoning.

Also, although I grant type systems do hinder the 'flexible' thinking of humans, I believe this is mostly a reflection of our computation models. E.g. imperative programming also hinders many forms of 'thinking' about programs. Eliminating types doesn't really enable the human programmers in a meaningful way. What we really need is the ability to use type systems to model alternative paradigms. GADTs could help with this.


I see a lot of different things going on here (all of them interesting in their various ways), on roughly three different levels. Most immediately there's the technical level, where I'm quite interested in the remarks on GADTs (I might remark on that in the other subthread). Discussion of technical issues may sometimes get somewhat scrambled by divergence of thinking on other levels, so that level may benefit from untangling what's going on elsewhere. A level below is the purpose of programming languages; and deeper yet is the relationship between sapience ("thinking") and technology (computing, formal reasoning). We needn't end up agreeing on all these things, for clarifying thought at the deeper levels to help in discussing things above.

I take as given that there are some things the sapient mind does that computers simply can't do. This much seems glaringly obvious to me. Computers do not understand anything; they aren't even "stupid", they're just mechanical manipulators. Technological artifacts cannot in full generality judge whether a technological artifact performs acceptably, cannot judge when it's time to step outside the rules, cannot judge what do once one does step outside the rules. Cannot see that there is a bigger picture. (A particular task that demands a great deal of the sort of thing technology can't do is driving a car; it's tempting to spend a small rant on the sheer stupidity of assigning that task to a technological artifact.)

That's separate from the question of what computers can do better than sapient minds can. My phrasing may create the misapprehension that I'm unaware computers are good at some things we aren't; for example, if I say sapient reasoning is "more powerful" than formal reasoning. Computers compute, people think; and any attempt to simulate thinking by computation, or to simulate computing by thought, is not going to work well. Which said, the reason I'd say thinking is the more powerful of the two is that the things computers do better — the fast, precise mechanical manipulation of data on a massive scale — can be tediously tracked through by sapient minds, gradually becoming less and less feasible due to scale, whereas the things sapient minds are good at are just completely out of reach for technology (though creating the illusion of simulating thought is possible, right up until something happens to demonstrate the total lack of comprehension in the simulation).

When it comes to determining the "correctness" of computer programs, again there's a terminology problem. Some aspects of the rightness of a program may be subject to formal reasoning, while other aspects are not; the term "correctness" is often used to mean just the formal aspects, but I don't mean to use the term in that limited sense. Programming languages are a tool for sapient minds to specify what computers are to do, and as such, programming languages are a prime example of an interface that should enhance the sapient mind: amplify its strengths while shoring up its weaknesses. I perceive this sort of enhancement, for any human-machine interface, to be an extraordinarily formidable challenge. Static type checking seems on the face of it like a great way to use a strength of computers to make up for a weakness of people; but I've become deeply concerned that in the process of supposedly reinforcing us on that thing we're not good at, it winds us in red tape and thereby prevents us from exercising our strengths (let alone amplifying them). You asked, somewhere above, how I would go about actually doing this; that's back up at the technical level, and that's the sixty four dollar question. (Inflation has really taken its toll on that old expression, btw; the other day I heard some official say, probably in reference to COVID-19, something about the "billion-dollar question".) I appreciate the question, as it's getting to the heart of the matter. If I had a half-baked solution, I'd describe it for you; what ideas I have on the matter feel less that half baked. I do think a likely place to look for inspiration on this is in language strategies that don't use static typing; though that doesn't seem the only place worth looking.

"correctness" of computer programs

Some aspects of the rightness of a program may be subject to formal reasoning, while other aspects are not

True. There is the common distinction between verification (are we doing the thing right?) and validation (are we doing the right thing?). Formal reasoning helps a great deal with verification, but doesn't tell us whether we're solving the right problem or providing effective value to our users.

Programming languages are a tool for sapient minds to specify what computers are to do, and as such, programming languages are a prime example of an interface that should enhance the sapient mind

Unfortunately, PL designers don't really know much about enhancing the human mind, beyond apparent utility of support for domain-specific notations and problem-specific abstractions. We do know how to hinder human reasoning, for example with Turing tarpit PLs, or pervasive mutation in a concurrent system. But our history of experiments with different 'paradigms' hasn't revealed any obvious winners.

That said, it's certainly worth exploring other features that would enhance the effectiveness for a given level of human effort.

One idea I've been exploring is to model programs as weakly reflective, soft constraint systems. This would enable exploration of a design space, incremental refinement of the software design, and robust behavior when parts of the system change. Type safety would become a filter or guide on a search space of potential designs, rather than a barrier for writing a specific design.

enhancing the sapient mind

Just a couple of small thoughts re

Unfortunately, PL designers don't really know much about enhancing the human mind[...]. We do know how to hinder human reasoning, for example [...]. But our history of experiments with different 'paradigms' hasn't revealed any obvious winners.

Alas, I include static typing on the list of ways to hinder human reasoning; if there's a way to avoid that effect of static typing, I'd be surprised, as the effect looks to me awfully fundamental. As for ways to enhance, not to belittle any particular line of research going forward, here's a quote:

LISP has jokingly been described as "the most intelligent way to misuse a computer". I think that description a great compliment because it transmits the full flavour of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.

— E.W. Dijkstra, The Humble Programmer.

Types and reasoning

If static typed programs only hindered reasoning, then dynamic should be a clear winner for reasoning. But it is not. Why not?

Because for every barrier static types raise, they also reduce the burdens of defensive reasoning, enhance our ability to track down what breaks on a change, and automate the more laborious aspects of reasoning about behavior.

It's a tradeoff in the human reasoning aspect. Whether it's a win or loss probably depends on context and scale.


"If [static typing] only hindered reasoning, ..." — indeed, this is why I did not suggest it only hindered. There are some situations, though, where you need the coefficient of friction (or some such term) to be exactly zero, rather than merely small; that is, one needs to have certain things vanish entirely rather than merely being minimized. (Extending the metaphor, one may be adding something arbitrarily small that must not be washed out, or multiplying by something arbitrarily large that must not cause things to blow up.) That is either the origin, or perhaps an alternative view, of my mistrust of supposedly-fundamental tradeoffs. (Also, one may often understand a tradeoff better by considering scenarios that don't make that tradeoff.)

friction and fluidity

Can you provide an example of a situation where you "need the coefficient of friction (or some such term) to be exactly zero", and where dynamic types (perhaps combined with other features) actually achieves this in practice?

Based on my understanding and observations, dynamic types do not reduce what I might call 'frictions' in software development to anywhere near zero. Dynamically typed languages are like a glue, not a lubricant. Dynamic types don't reduce the need for frameworks or software packages or stable APIs. Adding a feature to a program still requires shotgun edits unless we've designed for that change up front. Dynamically typed codebases and ecosystems still calcify and decay and are very often fragile... they just hide the damage better it because there are no static types to sound the alarm. Of course, this can be mitigated by thorough automated testing, so it is still possible to achieve a reasonable degree of robustness.

The most positive thing I can say for dynamically typed languages is that they're typically more fluid up front. Like glue. What I mean by fluidity is that we can sloppily combine ideas and make them fit together. Related: stone soup programming. In contrast, statically typed languages conventionally feel more analogous to working with rigid solids - beams and walls and wires - and trying to fit things neatly together with onerous deliberation.

However, this fluidity does not depend on the specific feature of 'dynamic types', nor is it implied by dynamic types. The bare minimum definition of dynamic typing is that type errors are detected at runtime, causing a defined behavior such as halting instead of undefined behaviors. This does not imply features such as reflection (typeof), ad-hoc polymorphism (like adding number to string), duck typing, deferred binding, etc.. Dynamic types, by themselves, give us one advantage over static types: we don't need to worry about a type system not being 'advanced' enough to encode concepts such as heterogeneous lists.

Statically typed languages can easily be designed with features for fluidity - reflection, software that adapts in ad-hoc ways to its environment. And a few type system features - especially GADTs, row-polymorphism, and existential types - can give us the flexibility we need without requiring too much reasoning be performed within the type system.

I'm very interested in these fluid, flexible, statically typed programming languages, together with partial compile-time evaluation so we can harden software components down for real-time systems, mission-critical software, FPGA targets or similar. My current language is named Glas after that two-phase material glass. The idea is that we'd mostly work the language in liquid form, but allow it to harden for performance reasons (e.g. for FPGA or real-time systems). Monadic, soft constraint models seem to be a very expressive basis for staged, fluid programming.

I believe that, with static types, we can make the 'fluidity' of software much more robust and scalable. Where dynamically typed systems are briefly fluid for a new project, use of static types can explicitly preserve fluidity even across modules and packages and frameworks as projects and software ecosystems scale and mature.


I think I'd be satisfied with systems that support the kind of reasoning I need to do anyway, in order to convince myself that my code will work properly. Such systems would necessarily need to do their analysis before run-time, so call it what you want, it smells like type systems to me.

The crux is notation, not logical incompleteness. The promise of formal verification has always been that the computer could check your reasoning for you, once expressed in a suitable notation. And the hangup has always been that first step: expressing such reasoning is just too much work, making it impractical for all but the most exceptionally demanding cases.

I don't know if verification will ever be plausible for everyday programming, but a massive improvement in notation would be a necessary prerequisite.

Discussing correctness

A significant point here is, who is mean to read a programming language? If the only target audience is a computer, that's one thing; if the target audience is sapient programmers, that is another.

Sapient thought, I've observed, is far more flexible than formal reasoning, for much the same reason that automated translators don't work: the automation is limited to mechanically manipulating the syntax, whereas sapient minds can access the meaning. (This is allied with my supposition that the sapient mind is more powerful than a formal system; contrary to dmbarbour's apparent position in the above subthread.) Proofs in mathematics are a public conversation between sapient mathematicians, and thereby far more flexible than the sort of automated proof systems used for program verification. When a program becomes an attempt to explain what one is doing to a computer, this drags the determination of correctness down to the computer's non-sapient level; if the program is a conversation between programmers, determining correctness becomes a sapient discussion and potentially much more versatile. (For years I was puzzled that Lisp's characteristic features didn't seem to form a coherent whole; I didn't see what self-representation, garbage collection, and bignums have in common; till it dawned on me that these are all features that allow Lisp programming to be more of a conversation between programmers rather than a conversation with the computer.)

Flexible but Unreliable

My position is that relative power or flexibiliy of sapient thought is much less relevant than reliability, reproducibility, robustness of the reasoning. At least for the particular problem of correctness in large software systems.

We humans cannot continuously be vigilant readers of code in the face of change, nor disciplined clear authors of code. We aren't machines. Fatigue, limited attention span, concurrent development, Conway's Law, etc. limit us quite severely in practice. It simply doesn't matter that our thinking is flexible if we're too tired or too busy to deeply review changes.

The flexibility of sapient reasoning doesn't help with correctness unless you can automate the sapient reasoning.

Aside: You point to mathematics as proof of utility for sapient thought. I believe that if math were to scale and change as rapidly as software systems do, human reasoning would prove unreliable in that domain, too. Arguably, this is already the case.

Target audience

A programming language needs to address *both* human and machine, of course. This is what distinguishes it from a purely mathematical notation.

Having taken at least a cursory look at formal verification, I think the main problem is that the notation is still too closely tied to traditional mathematics, to be well adapted to practical programming use.

You also have the complicating factor that, at the end of the day, your program needs to run on an actual computer. And, efficient use of the actual computer is one of the things that some people want verification to help them address, rather than cover up and pretend it's not an issue. Not everybody, or most applications, but some.

Also, in actual practice, the vast majority of the code people write (and might hypothetically want verified, if it weren't too much trouble), doesn't come anywhere near the limitations of formal systems. This is the case even for most (though not all) algorithms of practical interest.

Limits of Formal Systems

the vast majority of the code people write (and might hypothetically want verified, if it weren't too much trouble), doesn't come anywhere near the limitations of formal systems

Unfortunately, most type systems today also aren't anywhere near the limitations of formal systems. There are exceptions such as F* which uses SMT solvers to analyze refinement types, and features such as GADTs which enable encoding more reasoning in code instead of the type system. Perhaps if those features become mainstream, programmers will encounter fewer issues with programming at the level they want and still verifying things.


Duplicate, sorry


Generalized Algebraic Data Types (GADTs) are essentially the FEXPRs of type systems.

GADTs encode rules for correct composition, equivalent to encoding structural constraints in a grammar. But most reasoning is not encoded in types, rather in functions operating on the GADT. Like FEXPRs, GADTs give us a first class high level language and defer implementation details.

You ask about the possibility of "embedding logical reasoning within the computational system" instead of types. Well, GADTs are one way to do it.

A problem oriented perspective

There is a large number of challenges to understanding code, some of which are avoidable, but many of them are not. They are problems of the underlying domain, and the program is there to adequately express the problem. It makes the existing difficulty apparent.

Of course, in many cases, programming languages express the difficulty of a consensus between different objectives. Also of how to deal with the technological debt built up through the history of a language. But every domain has its own.

Static type systems are a particular way to express a certain kind of problems. But they are not a universal solution, nor are they an absolute necessity. What I like about them is that you can refactor very boldly and be told where you have to make amendments to keep your changes consistent. But types tend to enforce making expectations explicit. You may not want to enforce your own expectations on others.

It would be good to know what kind of problems static typing exposes well and what kind of problems it hides or renders inaccessible.