Among many interesting works, the POPL 2016 papers have a bunch of nice articles on Gradual Typing.

by Matteo Cimini, Jeremy Siek
2016

Many languages are beginning to integrate dynamic and static typing. Siek and Taha offered gradual typing as an approach to this integration that provides the benefits of a coherent and full-span migration between the two disciplines. However, the literature lacks a general methodology for designing gradually typed languages. Our first contribution is to provide such a methodology insofar as the static aspects of gradual typing are concerned: deriving the gradual type system and the compilation to the cast calculus.

Based on this methodology, we present the Gradualizer, an algorithm that generates a gradual type system from a normal type system (expressed as a logic program) and generates a compiler to the cast calculus. Our algorithm handles a large class of type systems and generates systems that are correct with respect to the formal criteria of gradual typing. We also report on an implementation of the Gradualizer that takes type systems expressed in lambda-prolog and outputs their gradually typed version (and compiler to the cast calculus) in lambda-prolog.

One can think of the Gradualizer as a kind of meta-programming algorithm that takes a type system in input, and returns a gradual version of this type system as output. I find it interesting that these type systems are encoded as lambda-prolog programs (a notable use-case for functional logic programming). This is a very nice way to bridge the gap between describing a transformation that is "in principle" mechanizable and a running implementation.

An interesting phenomenon happening once you want to implement these ideas in practice is that it forced the authors to define precisely many intuitions everyone has when reading the description of a type system as a system of inference rules. These intuitions are, broadly, about the relation between the static and the dynamic semantics of a system, the flow of typing information, and the flow of values; two occurrences of the same type in a typing rule may play very different roles, some of which are discussed in this article.

by Asumu Takikawa, Daniel Feltey, Ben Greenman, Max New, Jan Vitek, Matthias Felleisen
2016

Programmers have come to embrace dynamically typed languages for prototyping and delivering large and complex systems. When it comes to maintaining and evolving these systems, the lack of explicit static typing becomes a bottleneck. In response, researchers have explored the idea of gradually typed programming languages which allow the post-hoc addition of type annotations to software written in one of these “untyped” languages. Some of these new hybrid languages insert run-time checks at the boundary between typed and untyped code to establish type soundness for the overall system. With sound gradual typing programmers can rely on the language implementation to provide meaningful error messages when “untyped” code misbehaves.

While most research on sound gradual typing has remained theoretical, the few emerging implementations incur performance overheads due to these checks. Indeed, none of the publications on this topic come with a comprehensive performance evaluation; a few report disastrous numbers on toy benchmarks. In response, this paper proposes a methodology for evaluating the performance of gradually typed programming languages. The key is to explore the performance impact of adding type annotations to different parts of a software system. The methodology takes takes the idea of a gradual conversion from untyped to typed seriously and calls for measuring the performance of all possible conversions of a given untyped benchmark. Finally the paper validates the proposed methodology using Typed Racket, a mature implementation of sound gradual typing, and a suite of real-world programs of various sizes and complexities. Based on the results obtained in this study, the paper concludes that, given the state of current implementation technologies, sound gradual typing is dead. Conversely, it raises the question of how implementations could reduce the overheads associated with ensuring soundness and how tools could be used to steer programmers clear from pathological cases.

In a fully dynamic system, typing checks are often superficial (only the existence of a particular field is tested) and done lazily (the check is made when the field is accessed). Gradual typing changes this, as typing assumptions can be made earlier than the value is used, and range over parts of the program that are not exercised in all execution branches. This has the potentially counter-intuitive consequence that the overhead of runtime checks may be sensibly larger than for fully-dynamic systems. This paper presents a methodology to evaluate the "annotation space" of a Typed Racket program, studying how the possible choices of which parts to annotate affect overall performance.

Many would find this article surprisingly grounded in reality for a POPL paper. It puts the spotlight on a question that is too rarely discussed, and could be presented as a strong illustration of why it matters to be serious about implementing our research.

by Ronald Garcia, Alison M. Clark, Éric Tanter
2016

Language researchers and designers have extended a wide variety of type systems to support gradual typing, which enables languages to seamlessly combine dynamic and static checking. These efforts consistently demonstrate that designing a satisfactory gradual counterpart to a static type system is challenging, and this challenge only increases with the sophistication of the type system. Gradual type system designers need more formal tools to help them conceptualize, structure, and evaluate their designs.

In this paper, we propose a new formal foundation for gradual typing, drawing on principles from abstract interpretation to give gradual types a semantics in terms of pre-existing static types. Abstracting Gradual Typing (AGT for short) yields a formal account of consistency—one of the cornerstones of the gradual typing approach—that subsumes existing notions of consistency, which were developed through intuition and ad hoc reasoning.

Given a syntax-directed static typing judgment, the AGT approach induces a corresponding gradual typing judgment. Then the subject-reduction proof for the underlying static discipline induces a dynamic semantics for gradual programs defined over source-language typing derivations. The AGT approach does not recourse to an externally justified cast calculus: instead, run-time checks naturally arise by deducing evidence for consistent judgments during proof-reduction.

To illustrate our approach, we develop novel gradually-typed counterparts for two languages: one with record subtyping and one with information-flow security types. Gradual languages designed with the AGT approach satisfy, by construction, the refined criteria for gradual typing set forth by Siek and colleagues.

At first sight this description seems to overlap with the Gradualizer work cited above, but in fact the two approaches are highly complementary. The Abstract Gradual Typing effort seems mechanizable, but it is far from being implementable in practice as done in the Gradualizer work. It remains a translation to be done on paper by skilled expert, although, as standard in abstract interpretation works, many aspects are deeply computational -- computing the best abstractions. On the other hand, it is extremely powerful to guide system design, as it provides not only a static semantics for a gradual system, but also a model dynamic semantics.

The central idea of the paper is to think of a missing type annotation not as "a special Dyn type that can contain anything" but "a specific static type, but I don't know which one it is". A problem is then to be understood as a family of potential programs, one for each possible static choice that could have been put there. Not all choices are consistent (type soundness imposes constraints on different missing annotations), so we can study the space of possible interpretations -- using only the original, non-gradually-typed system to make those deductions.

An obvious consequence is that a static type error occurs exactly when we can prove that there is no possible consistent typing. A much less obvious contribution is that, when there is a consistent set of types, we can consider this set as "evidence" that the program may be correct, and transport evidence along values while running the program. This gives a runtime semantics for the gradual system that automatically does what it should -- but it, of course, would fare terribly in the performance harness described above.

Some context

The Abstract Gradual Typing work feels like a real breakthrough, and it is interesting to idly wonder about which previous works in particular enabled this advance. I would make two guesses.

First, there was a very nice conceptualization work in 2015, drawing general principles from existing gradual typing system, and highlighting in particular a specific difficulty in designing dynamic semantics for gradual systems (removing annotations must not make program fail more).

by Jeremy Siek, Michael Vitousek, Matteo Cimini, and John Tang Boyland
2015

Second, the marriage of gradual typing and abstract interpretation was already consumed in previous work (2014), studying the gradual classification of effects rather than types.

A Theory of Gradual Effect Systems
2014

Effect systems have the potential to help software developers, but their practical adoption has been very limited. We conjecture that this limited adoption is due in part to the difficulty of transitioning from a system where effects are implicit and unrestricted to a system with a static effect discipline, which must settle for conservative checking in order to be decidable. To address this hindrance, we develop a theory of gradual effect checking, which makes it possible to incrementally annotate and statically check effects, while still rejecting statically inconsistent programs. We extend the generic type-and-effect framework of Marino and Millstein with a notion of unknown effects, which turns out to be significantly more subtle than unknown types in traditional gradual typing. We appeal to abstract interpretation to develop and validate the concepts of gradual effect checking. We also demonstrate how an effect system formulated in Marino and Millstein’s framework can be automatically extended to support gradual checking.

Difficulty rewards: gradual effects are more difficult than gradual simply-typed systems, so you get strong and powerful ideas when you study them. The choice of working on effect systems is also useful in practice, as nicely said by Philip Wadler in the conclusion of his 2015 article A Complement to Blame:

I [Philip Wadler] always assumed gradual types were to help those poor schmucks using untyped languages to migrate to typed languages. I now realize that I am one of the poor schmucks. My recent research involves session types, a linear type system that declares protocols for sending messages along channels. Sending messages along channels is an example of an effect. Haskell uses monads to track effects (Wadler, 1992), and a few experimental languages such as Links (Cooper et al., 2007), Eff (Bauer and Pretnar, 2014), and Koka (Leijen, 2014) support effect typing. But, by and large, every programming language is untyped when it comes to effects. To encourage migration from legacy code to code with effect types, such as session types, some form of gradual typing may be essential.

## Comment viewing options

### Remark on FLP and λProlog

Concering the first papers comment.

Dear All,

In my opinion, lambda Prolog isn't functional in the same sense as Prolog isn't functional. In both languages there is no definition mechanism for functions.

Functions are only a special case of predicates that happend to be functional in certain arguments that are viewed as input and output. But we want to retain flexibility and to use a predicate in any direction.

In as far lambda Prolog only offers the possibility to define predicates with higher order terms using higher order bodies, and I doubt that this is functional logic programming as somebody would understand it.

Bye

P.S.: So I guess summoning functional logic programming as is done in this wikipedia article is also wrong:
https://en.wikipedia.org/wiki/Functional_logic_programming
The situation is slightly different in Mercury, since Mercury has syntactic sugar for functions, but in the end they are also mapped to predicates, namely we find the following remark here:

Page 10, Mercury by Example
"It is worth noting that all three of our definitions are computationally identical and there is no difference in the code generated by the Mercury compiler."
https://www.mercurylang.org/documentation/papers/book.pdf

Some strange beast is Erlang, which has evolved from Prolog, is
functional and higher order, and not anymore relational.

So maybe Curry could count as the only true example in the Wikipedia article as a FLP. But I don't know enough about Curry. But one could add more to the list, like Isabelle/HOL which has rewriting and deduction. Etc..

### 5 real world examples of problems we schmuck are trying to solve

Can somebody give me five real world examples? Wadler names one - web programming in Links. Leijen gives two - Markdown processor and tier-splitted chat application (which is really a specific example of Wadler's web programming).

Also, I posit Leijen and others (including Meijer and Wadler) make the same error over and over again, and I exchanged emails with Meijer and Wadler on this very point years ago! And they continue to make the same mistake: Wadler defines blame as relative, where one type system is 'stronger' than another. For Wadler, blame requires consistency is reasoning. But no such techniques are really available in the real world and in most places where people want to use gradual typing. Leijen edits the mistake directly into his paper: "It is advantageous to write both the client and server part as one program since that enables us to share one common type definition for the data they exchange." In practice, why is it advantageous? The question we should be asking is, How do we design systems that "don't go wrong" when the server's schema changes and the client is out of sync?

Hopefully researchers will start listening.

### Gradual typing, contracts, and blame

Your remark is touching a delicate point about which I'm not sure blame researchers all agree on (I'm not an expert in this area). Gradual typing and Contracts overlap, but they are far from being the same thing.

In the Gradual Typing work, and in Wadler's work on blame, a part of the code has the privileged status of being "typed": it has passed a static check that verified correctness of typing. So there it is fair to assume, as you say, "consistency of reasoning". As a price to pay, you are limited to the forms of reasoning for which we know how to design simple type systems -- or have to use a baroque, complex type system and can't write programs easily.

In contrast, the Typed Racket work takes place in a wider setting of "contracts" where mutually-untrustful modules may impose whatever conditions on what is passed to them and make promises on what they will pass to others. Gradual typing is a specific case of contracts, and is (if I understand correctly) realized as such in the Typed Racket implementation, but the general contract system is richer in that it does not have this asymmetry of always blaming the untyped side from the typed side. Blame makes sense in this more general world of contracts without "consistency of reasoning", and there is a lot of work on Typed Racket to assign blame in this wider setting.

Note that I think that this distinction between typing contracts and more general contracts probably matters in terms of runtime system performance -- the topic of the "Is Sound Gradual Typing Dead?" paper. It may be the case that the typing contracts are easier to optimize, using the usual techniques of dynamic language implementation, than more general kind of contracts that require stronger symbolic evaluation capabilities -- this was one of the claims of the Pycket work that seems rather convincing.

I'm not sure about your point that detecting and correctly reporting mismatching assumptions is a bad topic to work on, because you think that automatic adaptation is more useful in practice. I suspect that we would need progress on the two fronts, and use them in different settings. (Precise API information is probably also useful if you hope to do automatic mediation.)

### We, the Schmuck

I'm not sure about your point that detecting and correctly reporting mismatching assumptions is a bad topic to work on, because you think that automatic adaptation is more useful in practice.

I said correctness is subjective, but manifested as objective using fancy mathematical definitions (one type system is stronger than the other, in the most trivial sense of unitypes vs anything with more than one type! Such trivial definitions are unknown-useful for math, but not necessarily known-useful for life). If there is a performance issue with this Ivory Tower, then my question is: Can we tear the tower down and build a new one? Sorry for waxing poetically, but your (excellent) OP touched on a delicate point for me, too. I literally begged Wadler and Meijer via e-mail years ago to consider that they were wrong, and the response I got back was merely, "Interesting food for thought."

If you recall, I even got into a design debate with the architect of Google Web Toolkit here on LtU about this very idea, when I pointed out the notion of using linkers and loaders to reason about dynamic partitioning in multi-tier applications was really a schema design and schema upgrade problem, and that any 'language' that ignored these versioning issues had little chance of succeeding.

By the way, I meant "consistency is reasoning", which is a mode of thinking expressed by many who do not allow for a more fuzzy realm of ideas. For example, artificial intelligence research often tries to train neural networks to discover/learn/organize ontologies but they often don't allow for "unsharpness", because they are (overly) concerned with accuracy and precision.

Ergo, my initial question to ask for real world examples. I want to know what Wadler wants to be sharp about. And not just some hand-wavy "effects" thing. Because, if describing effects is the purpose and performance is an issue, maybe we are best off just using AI and not worrying too much about performance (as one possible alternative mode of thinking).

(Precise API information is probably also useful if you hope to do automatic mediation.)

Sandro's comment addresses this. Precision is relative to whether the Closed World Assumption is in place, and once you make that choice/assumption, many design possibilities can happen on each side of that fork in the road.

### Is schema change really all

Is schema change really all that different than other dynamic behaviours, like revoking an access privilege? The authority to access a resource is always subject to revocation so clients "should" always deal with this possibility, so how is schema upgrade any different?

Even if it is different, doesn't this simply argue for more structural typing? After all, if the part of the API the client actually uses hasn't changed, then this wouldn't break clients if it were structurally typed.

Contracts seem pretty fine-grained as you can get for structurally matched semantics, so if a client no longer match the preconditions of a server invocation due to a server upgrade, that argues quite strongly that "not going wrong" entails "this client can no longer use this service", which seems to be what you're decrying. What alternative is there?

I have thought more about this work since the LtU submission, and I came to the conclusion that my answer to the title question is a resounding "no". The article puts the projector on the fact that runtime performance for gradual typing are poorly understood, not studied seriously enough yet, and very challenging.

If anything, that makes gradual typing *more* interesting: I already thought this was something we wanted, but this is something we want *and* that raises difficult implementation questions. This is great, because it means an opportunity to apply different skills at this part of the problem, to get people onboard that are maybe less directly interested in the theory. What's a good runtime for contracts and gradual typing?

Some people are excited about homomorphic encryption, also its current presentations are not usable in practice. Gradual typing is in a much better shape, but I think it should be thought equally as of an implementation challenge, in addition to the (interesting) theoretical questions.

### What does soundness of gradual typing mean and is it important?

At the outset, we know that we can't automatically dynamically check correctness of sufficiently expressive function types applied to parameters (is Collatz total, etc.). So it looks like most researchers move the bar to the next reasonable thing, checking that primitives are well typed in whatever dynamic contexts actually arise at run-time.

In my opinion, tools supporting this kind of checking and blame assignment would be useful, but ought not be tightly integrated into the language semantics. I'm thinking type annotations should supply assertions that are supposed to hold and the type system provides a (hopefully useful) way to mechanically propagate these assertions. There shouldn't be an implied operational semantics in the case of type errors.

When you're producing a build to represents those semantics, you may want a release build that performs only the minimal checks needed to ensure memory safety (or maybe not even all of those), or you might want a debug build that checks as many types as it can.

If we had a system that supported principled integration of types with varying specificity, as above, I'm skeptical that users would be clamoring for better theoretical guarantees about what gets checked in the debug build.

### Inspecting source code at run time

At the outset, we know that we can't automatically dynamically check correctness of sufficiently expressive function types applied to parameters (is Collatz total, etc.).

Can't we? I think if we try to typecheck them without looking at their source code, we'll have a hard time with it, and that's the status quo in most languages. But if this is part of the language itself, then it's fine to look at the source code at run time and apply a static typechecking algorithm to it. It'll be just like how the garbage collector can traverse through references without respect for data encapsulation.

This does mean that a function leaks some implementation details. Maybe the function's author didn't mean to reassure you that it was a total function, but you found out anyway by checking a type other than what the author intended.

Hmm, if this were about dynamically loaded code, rather than gradual typing, that leak would actually be encouraging: Since you were able to load the code as data, you indeed have access to all its implementation details, so this kind of leak would be an encouraging sign that the language aligned snugly with the reality of the situation.

Perhaps gradual typing would be elegant once combined with the notion that every piece of code in the system has been dynamically loaded by someone. To treat an untyped value as a typed one, you can supply a proof that the one who loaded it is you.

### Maybe the function's author

Maybe the function's author didn't mean to reassure you that it was a total function, but you found out anyway by checking a type other than what the author intended.

I guess it depends on the type system you have in mind, but the problem of inferring whether a function is terminating is undecidable. That's what I meant that we can't do it automatically.

### Annotations provided by the caller

We have statically typed languages that ensure termination. If the typechecker processes untyped code along with a specific type to check it against, that's like processing typed code.

That said, statically typed programs usually have more than one type annotation scattered throughout them. If the typechecker only has one annotation to work with, it'll have to do a lot of inference if the untyped code it's checking makes an untyped function call into some other untyped code. For some type systems, maybe this requires too much inference, like you're saying.

Maybe I can salvage what I'm imagining by replacing parameter type annotations with typing annotations (e.g. a type along with a mapping of top-level definitions to types), but it seems to be turning into a complicated and cumbersome approach. Would it even be in the spirit of gradual typing? The program ends up being a fully typed program where some type annotations are merely in a different source location from the annotated code.

### If the typechecker processes

If the typechecker processes untyped code along with a specific type to check it against

I think I see what you're saying, but it doesn't make much sense to me. Suppose the typed function has a type like "total Nat -> Nat" and the dynamic argument is the Collatz function. The annotations needed to prove Collatz total would are presumably quite specific to the logic of that function and wouldn't make sense in the context of the static function.

### Sufficiently expressive types

To express Collatz at the type level would require type level integers and a sufficient portion of number theory to allow a type like:

n:Int -> if even(n) then n / 2 else 3 * n + 1


I am not even sure a function with this type is meaningful or could exist. A pure type level integer doesn't seem very meaningful, but an array dimension might work as an example:

f :: Array x n:Int -> Array x (if even(n) then n / 2 else 3 * n + 1)


Which would be a function 'f' that accepts an array of length 'n' and returns an array one iteration of the Collatz function away. Even this is not expressive enough, and we need to add type level recursion, such that we can say:

f :: Array x n:Int -> Array x (collatz n)


This can come from unexpected places like type classes (assuming backtracking is used when matching):

class Collatz x y | x -> y
instance Collatz 1 y
instance (Even x, Collatz (x/2) (y+1)) => Collatz x y
instance (Odd x, Collatz (3*x+1) (y+1)) => Collatz x y

f :: (Collatz n m) => Array x n -> Array x m


This is clearly a problem for a sufficiently expressive type system, but I think a more interesting question is: do any interesting functions have this type? Clearly type level arithmetic is meaningful, for example tracking the size of an array:

concat :: (Add n m p) => Array x n -> Array x m -> Array x p


But does Collatz have any meaningful use? Are all useful functions typeable by properly terminating type expressions? That is an interesting question.

### Oh! Term-level proofs are unlikely in untyped code. Hmm.

Oh, I see. It sounds like you're saying that if we try to pass an untyped implementation of the Collatz function to a place that explicitly expects a total function, we don't know if that's valid or not (even at run time). The type checker would need to search for a classical proof that perhaps even the whole world hasn't managed to find yet, and perhaps it would never find the proof it needs at all.

As I formulated a response to this, I realized that this is indeed a complication that I don't have an answer to.

Usually, I've come to expect that a type annotation not only expresses that the value has the type in question, but also that the proof of this fact is easily obtained. If a certain kind of proof is beyond the range of what the type checker can easily search for, I would expect them to be made available explicitly, by passing around proofs at the term level.

But if the selling point of gradual typing is that some parts of the code would be developed by in an untyped style, I shouldn't expect that style to make any use of term-level proofs at all. And if there are no term-level proofs, I don't know how to close that gap between provably total functions and type-checkably total functions.

### Type Annotations

It would seem that if the collatz function were defined in an untyped way, and then imported into the typed section, the type checker would get stuck in an infinite loop. Pragmatically you would set a recursion depth limit on the type checker, so that in this case it would get stuck for a few seconds, and then emit something like: "unable to prove/disprove function 'collatz' total".

This would then require the collatz function when imported into the typed code to have a type annotation declaring it total (so the programmer is telling the type system to believe it is total).

The annotations could also take the form of proof-search hints or strategies so that the type system can quickly complete the required proof.

### Yes, these last two posts of

Yes, these last two posts of Ross and yours are what I was concerned about. The pragmatic thing, IMO, is to separate what's being asserted by the type from the question of whether or not it's provable. When you attempt to pass Collatz as a function parameter that's typed total, the type system should respond in a simple way, probably by immediately giving up. But the semantics should be such that you now have an unchecked assertion that makes your program erroneous if it fails to hold. It should be possible to use a proof assistant to establish this unchecked fact, assuming the Collatz conjecture is provable. I think trying to squeeze this kind of proof into the type system and enforce that the type system be sound is counterproductive.

I have a 90% written paper that introduces/proves sound the logic that I want to use for this base level language semantics. It's nifty, IMHO :). I've been writing it for years now, but it's been a lot of fun.