IDE design for immediate feedback

I recently saw an inspiring presentation about how programming frameworks can help software developers, by providing more immediate feedback on the effect of changes to your code and better visualizations to understand what your code is doing. I found it thought-provoking, and I thought others here might, too.

Bret Victor, Inventing on Principle. CUSEC 2012.
http://vimeo.com/36579366

Enjoy!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Yes, we've been doing this

Yes, we've been doing this since 2007 (at least).

Was superglue ever released?

Or was there a video? A demo makes things more real for the audience.

I recently saw a raspberry Pi for the first time. There's no comparison to just reading about it in the abstract.

There was/is a video (many

There was/is a video (many actually), they are embedded in the paper! You'll need to read the paper in Acrobat/Acroread on a computer with quicktime installed to view the videos. Ya, this wasn't the right way to do this, but it seemed like a cool solution to this problem at the time.

I never released SuperGlue, the story was very incomplete and the prototype never really advanced beyond demoware. You could do some cool things with it, but never anywhere near what I would call useful programs. My current tablet work is the spiritual to SuperGlue, and I've added live programming features back in my most current demo, which goes along with my latest paper, but I don't talk about these there. I am, however, working on a demo video to show off my current system, maybe not as cool as Bret's demos, but it will have many of the same elements.

In planning my next project, my goal is to make live programming real: this means it has to be useful in writing real programs, not just toy UI programs (my big problem in the past). Imagine writing a compiler where you can see, in real time, how changes in your code change the processing of your test input. Imagine editing not just code, but code overlaid with their execution trees.

Imagine writing a compiler

Imagine writing a compiler where you can see, in real time, how changes in your code change the processing of your test input. Imagine editing not just code, but code overlaid with their execution trees

This is cool. A good way to leverage unit tests.

"Real Programmers", of course, edit execution trees implicitly. Programming is like chess in this respect.

Execution trees

"Real Programmers", of course, edit execution trees implicitly.

Funny as a joke, but not convincing as a point. This doesn't account for refactoring for example.

(I initially planned to answer that we could also edit type derivations, after constraining (by edition) the type domain enough for them to capture only interesting execution trees. Refactoring suggests that we are in fact editing the stream of "futures" of our program, generally to make it converge towards correctness/runability, but also sometimes to give it a more appropriate structure that helps productivity. The analogy to chess still stands, only the winning situation is the finished source code, not the current version's execution result.)

Subtext

I only want to overlay execution details on top of the code, or project the code on the execution tree, I'm not going to try the edit tree execution thing, as far as I know only Subtext (also a live programming language) has explored that. Editing the code for a procedure would still effect every invocation of it, not just the particular stack frame it was projected over when edited.

The idea is to reduce some cognitive burden by laying the execution out in front of the programmer (they don't have to do it all in their head).

That's what I took you to be

That's what I took you to be saying.

Funny as a joke, but not

Funny as a joke, but not convincing as a point. This doesn't account for refactoring for example.

That's why we don't do refactoring ;-)

I don't think I got the

I don't think I got the joke. But as for why we don't do as much refactoring as we, sometimes, should, it's been mostly because of the business folks on my end... (YMMV, sure)

For, maybe I was especially unlucky, but it hardly ever happens that they even grant us the time to do so !

A sketch of an interactive programming system.

I have a sketch in my mind of an IDE whose main purpose is to make it known to the programmer what the compiler/interpreter "believes" their code means, and how that system would work.

Many (most?) bugs manifest as a disagreement between the programmer and the system as to what the code means. A programmer usually knows a lot more about the code than s/he actually asserts or declares, and everything not asserted or declared, in most languages, is not checked. My thought is that whatever information we can make available about the meaning as viewed by the system would be checked casually by the programmer, and make his/her job much easier, if it were on view parallel to the code while the programmer is working on it -- and most particularly when the programmer is investigating a bug that is likely to be revealed by it.

There is a convention for dealing with ancient texts whose meaning is obscured by differences in writing style, language, or culture. It's called the "gloss." You have the text itself reproduced, and then parallel to it, you have the gloss, which is a commentator or analyst's explanation of what it means and why. We're familiar with a couple of different kinds of glosses; everybody's seen the "Annotated Shakespeare" with footnotes explaining the meaning of Elizabethan cultural references like "Get thee to a Nunnery" and so on. Most of us have seen foreign-language documents and books with interlinear or facing-page translations. Both of those are examples of glossing.

When working on code, I'd like to have a "gloss" parallel to it that says things about sections of code, according to the most recent compilation / check / instrumented run / whatever. Bonus points if recompilation / rechecking / whatever happens at intervals, and the messages update while I have the file open, without interrupting my editing.

At a first-cut, you could just put warning and error messages from compilation next to the actual code they refer to. As a second cut, suck in some existing code-checking tools like "lint" etc, and put those messages next to the code too. As a third cut, if your system infers type information, the "gloss" parallel to the code should tell you what type information it has inferred. Calls could be annotated with information about whether or not the system can optimize them as "tail calls" and handle them without growing the stack. Sites where allocation happens could be annotated with information about the number of objects allocated here, and their lifetime if information is available, and how many got automatically/instantly deallocated at end-of-life and how many had to get hunted down and reaped (after an unpredictable delay) by GC.

Loops could be annotated with number of times they're getting repeated in a run. Macro invocations should be glossed with the macro-expanded code (which in turn can be glossed with compilation messages so the programmer can *SEE* the error in the macro-expanded code the system is talking about). If profiler output is available, the amount of time spent executing each procedure could be included. Functions and procedures could be annotated with information about whether and where they were inlined. And so on.

The amount of information that can be made automatically available is vast, and many times having it available to look at, casually, while perusing the code, would "surprise" the programmer and enable them to easily locate a bug or identify an opportunity to optimize.

The UI for such a large amount of information would have to allow some picking and choosing, but I think that ought to be a fairly easy choice between several glosses, each created for a particular purpose. You'd have a memory-management gloss, a flow-of-control gloss, a speed profiler gloss, a correctness gloss, a type gloss, and so on. Each different gloss would collect a bunch of information relevant to debugging a particular aspect of the code you're working on, and display it right next to your code, scrolling when the code scrolls, etc.

You could even have custom glosses, like a "pure-functional" gloss that puts red flags next to any code that causes side effects or calls code that causes side effects, or a "code style standards" gloss that puts style-checker output next to the relevant line.

Another area with substantial room for improvement is assertions and declarations. They are good for checking programmer's beliefs about how code works and making sure the system agrees, but you never really declare/assert everything you know, because if you did you'd clutter up the code to the point of obscuring its structure or requiring "more than a screenful" to see the whole of a coherent unit. In order to encourage their use, we need to enable them to be used without taking up too much visible space in the source code. They can be part of the text source file, but the system doesn't always need to display them by default. You could have some that always display because you've picked them for their documentation value, and some that you've "hidden" to de-clutter your view of the code.

In an interactive system why shouldn't any number of assertions and declarations, unless *chosen* for inline display, just show as an asterisk at the end of a line? They'd auto-expand (and turn red or something) in the case of an assertion failure or declaration mismatch; otherwise, if you want to know what it's declaring or checking, you click on it for the expansion.

If you want to know what *ALL* the assertions are checking, pick the "correctness" gloss in the gloss panel, and they'll get displayed parallel to the code, along with commentary from the system such as,
"This assertion was not checked during runtime because this property was statically proven to be true by the compiler" or "runtime checking turned off, click _here_ to turn back on," or even "warning, this assertion condition is non-pure. Side effects of checking may affect the correctness of the code." etc.

This all seems do-able; I mean, sure, it'd be a lot of work, but probably less than is going into most modern IDE's, and IMO probably more valuable to the programmer.

Bear

How much semantic feedback

How much semantic feedback do you expect to get from the static type system and/or various responsive static analyses? I definitely appreciate semantic feedback (I state this very prominently in the 2007 Onward! paper), but I think people won't start going "wow" until we include dynamic feedback from the executing program. It isn't very surprising that this work is progressing in the context of dynamic languages (e.g., lighttable and clojure), rather than in static languages; not that statically typed languages can't benefit from dynamic feedback, but just that many in the static typing community do not have the right mindset (worrying too much about static correctness and not enough about the executing program).

At the very least inferred type information.

At the very least, static analysis can reveal inferred type information. It's the style in new statically typed languages to let the programmer skip type declarations where the system can infer them. But if the programmer thinks something is just an array of characters and the type inference mechanism has decided it must be a member of the string class, it would be better to have the inferred type visible in a type gloss.

Also, every warning that a conventional compiler can generate, including comments like "branching recursions are inefficient" and so on, results from static analysis. Why not display them next to the code they refer to?

Static analysis can reveal whether calls are tail calls that can be handled without growing the stack, whether functions and procedures can be inlined, what macroexpanded code looks like, whether "weighty" return values can be constructed in place or have to be copied back to the caller, etc. Static analysis is also the basis of basically everything 'lint' and a bunch of related tools ever did, and new static analysis tools are still getting better.

So, yeah, I'd say there's a lot of semantic information, including but definitely not limited to type information, that can be extracted statically. Drastically more than appears from a "casual" reading of the source code itself.

Ray

I agree there is static

I agree there is static information to present; especially in a language like Scala where you'd like to be able to view inferred types or be aware of implicit conversions (I here they finally have that now, I couldn't get the trees for that in my day). The 100% sure stuff is great to have around if its easily gotten.

Now, judgement calls, heuristics, performance hints? Programmers really don't like being told what the compiler thinks if it isn't very mostly true, at least all the time! Lint is a great example of something that I want to forget most of the time. I guess we could have clippy popup and say..."looks like you are writing slow code, let me teach you how to optimize it" or "looks like you have a bug!" Now, if you can build a real-time theorem prover, then that is something else.

Macro expanded code is something else, but then I want that for dynamically resolved function calls also :)

Dynamic feedback?

Haven't static languages had this for decades under the name "debugger"? It sounds like you have in mind improvements for debugger UX (such as "preview dispatch target") together with automatic selection of an example context/lexical environment for a given point in the code.

I'd like to see this sort of thing presented as context management, where you can work with a time travel debugger from a recorded program run (it'd be great if you could filter/search for certain contexts with queries like x==true, etc. and possibly have this optimized by building an index for x during the program run), or generate random contexts by static methods (like quickcheck), etc.

These kinds of features are best implemented with static analyses, and thus I predict will be best implemented in static languages that have a better handle on the program structure.

Why time travel when you

Why time travel when you can treat time as a spatial dimension? So you just project code on the execution tree, where it is all conceptually anging around. Of course, you can do tricks like not trace something that is collapsed and recompute execution context only as needed for inspection. Search is then just finding something in a large tree.

Static analysis isn't meaningful if you only care about a specific execution. Well, maybe it's useful in making inspection efficient, but really I see live programming valuable either way. The static typing crowd just might not think this is important enough to put there time there, while the dynamic typing crowd are starting to jump on this. It's more about what people want to spend time on.

Why time travel when you can

Why time travel when you can treat time as a spatial dimension?

What's the difference?

Static analysis isn't meaningful if you only care about a specific execution. Well, maybe it's useful in making inspection efficient, but really I see live programming valuable either way.

You don't only care about a single program trace. For example, if you change a line of code, you don't want to have to rerun the entire program. Static analysis can you get you bounds on how much you have to re-run when things change. Anyway, I don't deny that you can probably do most or all of this in a dynamic language and even that it's probably easier to do some of these things now in existing languages. I don't think that's fundamentally so, though, and think that properly staged ("static") languages will eventually have the upper hand here.

I ... think that properly

I ... think that properly staged ("static") languages will eventually have the upper hand here.

I don't know about that. The main difference between "static" and "dynamic" languages is that "static" languages decide in advance on a set of things you cannot do (Mostly, running the compiler at runtime).

But, if some of those things are capital-H hard to analyze, does that really count as an "upper hand" for static languages? It seems to me that for as long as the program is doing things that are *POSSIBLE* in static languages, it's just as amenable to analysis as programs written in a static language.

Ray

The main difference between

The main difference between "static" and "dynamic" languages is that "static" languages decide in advance on a set of things you cannot do (Mostly, running the compiler at runtime).

The reason I phrased that quote the way I did (referring to "properly phased" and then calling that "static" in a parenthetical) is that my main problem with dynamic languages is not the lack of a type system. I think static type systems are useful but am on board with optional/hybrid systems. The main thing I dislike about dynamic languages is phase corruption. Things like giving every value run-time accessible tags describing what sort of thing it is. It is probably possible to do extra analysis to recover some of that, but I'm skeptical that you can robustly recover all of it.

You mean like boxing in C#?

You mean like boxing in C#? I guess phase corruption is pretty pervasive in any language that reifies type information for use at run-time, but damn its so convenient! This is the one plus feature I got moving from Scala (generics erased) to C# (generics reified at run-time). V-tables also constitute a form of phase corruption that is generally considered essential by most programmers.

V-tables I think of more

V-tables I think of more through the lens of existential types (c.f. gasche's reply below).

What do you think about

What do you think about dependently typed languages then? Or languages with just existential types even? There you have to pass some types around at run time as well, or you have to use an uniform value representation just like in dynamically typed languages. There is still a difference of course; in dynamically typed languages the type tag is always attached to the value, in dependently typed languages the types fly around separately. That may make it easier to eliminate them at compile time in common cases (for example by monomorphization).

Existential types (at least

Existential types (at least those of the ML tradition) do not require runtime type information. They may require boxing if your implementation use different representations for different types in a way that is not reflected by the type system (otherwise you could reflect the representation difference of the existential type as well).

"Dependently typed languages" is a huge field with a lot of different solutions. Some of them break the phase distinction, some of them don't (notably systems with singleton types as the only form of dependency). This is an ongoing research area and I don't think you can expect an answer of the form "I (don't) like them", but rather "I'm interested in pushing them in a direction that preserves phase distinctions".

I agree with the above

And I am using a dependent type system that preserve distinctions.

If I'm not mistaken, you do

If I'm not mistaken, you do need the type information for GC. And if you don't want boxing then you need the type information as well. Though you may not need the compete type information: for GC you just need to know how to walk the data and if you don't want to box them you need to know how to pass such a value around (in practice that means you need to know how many bytes it is).

Warning: the following may well be a bunch of nonsense:

If I understand the working of dependent type systems correctly then it is like this: to type check you normalize types to some normal form. The types may contain free variables which correspond to unknown runtime values. If after normalization the types match up then we accept and otherwise we reject. So we may reject things that would work out fine (because we use syntactic equality on normalized terms, instead of semantic equality). Types are evaluated and checked at compile time. Is there any work on systems that decouple phase distinction from types, so that the phase that something gets evaluated is independent of whether it is a type or not? This normalization business also seems very much related to partial evaluation, as both try to evaluate a program with unknown data as much as possible. Is there a connection? With partial evaluation, the algorithm decides what gets evaluated when. With staged programming, the programmer decides this. So my question is: is there a dependently typed programming language where the programmer decides when things get evaluated and checked?

Look into F* (Fstar) from

Look into F* (Fstar) from Microsoft Research.

I wouldn't count use of

I wouldn't count use of representation information by the GC an example of "phase blurring", as it is essentially not visible by the programmer and probably not described by the language semantics. Such a distinction would make sense in language whose semantics explicitly tackle representation issues, but in this case the treatment of existentials will depend on how representation in general is expressed in the semantics. (For example Haskell has a rather rudimentary treatment of "unboxed types" that may not be used to instantiate polymorphic definitions; similarly existential types won't be able to abstract unboxed types without a boxing step.)

Most current "well-understood" dependent type systems need to reduce terms (program expressions that will hopefully reduce to "values") during type-checking (when deciding equality between two types), which means that program evaluation and type-checking are interleaved. This is a good reason why such systems strive for strong normalization (all programs terminate) -- the other being that logical interpretation of types as propositions rely on their inhabitants, proof terms, to be normalizing. You can try to relax this restriction by:

- separating safe terms that can be evaluated during type-checking from the other (having a distinguished subset of terminating programs); this is what is done by the Trellys project for example

- reducing the expressivity of the type system by weakening the dependence of types on terms; you can have "value dependence" (types only contain fully-evaluated terms, not any terms) as in the F* language from Microsoft, or forget any term/value embedding in types by instead systematically "mirroring" the term values at the type level, as is done in Tim Sheard's Omega language (or see Monnier and Haguenaeur "Singletons types here..." 2009 paper).

"Well understood"

I guess that qualifier was meant to exclude the works of cranks such as myself ;). My approach to dependent types doesn't fit into your characterization of most dependent type systems: evaluation is not required for type checking. Of course, my type checking isn't decidable, either, but I see that as a requirement of any sane programming environment. Consider something like this:

foo: Vec n**2 -> Vec n -> Vec (n+1)**2
foo xs ys = xs ++ ys ++ ys ++ [0.0] 

That definition looks perfectly reasonable to me, but there's a nontrivial proof step required to validate the return type. I want a theorem prover working on that kind of thing, letting me know when it didn't prove things automatically or when it can show that the type is wrong, with a counter-example. I don't want to have to swizzle out a proof with inline combinators or something.

The way things work in my system, something like (Vec n**2) is a type that carries around an expression for n**2. When some other expression is supposed to be of that type, that results in a proposition to checked. Often the check is trivial (the inferred type and type required by context are syntactically identical), but sometimes not, as in the above.

BTW, there's another way that there could be a direct dependence of values on types if you allow something like this:

length: Vec n -> Int
length v = n

My general suggestion is that such things shouldn't be allowed, but some sugar may be useful for this special case where the parameter is a simple variable.

Involving a solver is a good

Involving a solver is a good idea. ATS makes heavy use of solvers for Presburger arithmetic.

To check that (n+1)**2 equals n**2+n+n+1, your solver is quite likely to reduce the first expression mechanically until it reaches the second form (or one equivalent by associativity or commutativity). This reduction here corresponds to the "term reduction" that I was mentioning: when you check types for equality and those types have terms component, you must check term equality, and this may involve arbitrary reduction (~ execution). It's of course less prominent here in a specialized solver domain, but if you allow any kind of terms to appear in type you'll resort to very execution-like equivalence checks in the general case.

We don't want evaluation (do we?)

I don't see how evaluation would work:

(n+1)*(n+1)
 = n*(n+1) + (n+1)  if n>1
 = (n-1)*(n+1) + (n+1) + (n+1) if n>2 
 = ... -- where is this going?

I thought you'd have to insert some manual swizzles to set up the correspondence between Vec (n**2 + n + n + 1) and Vec (n+1)**2. Equality isn't generally decidable in Coq is it?

Even if evaluation does work, it still seems wrong to me. For one, evaluation might take a long time even if it ultimately terminates. More importantly, it would seem to destroy any hope of modularity. What if you only have an axiomatic description of Nat?

I was thinking of reduction

I was thinking of reduction of open terms with unknown variable that are considered irreducible.

Equality isn't generally decidable in Coq is it?

There is a decidable "definitional" equality used by the type system, but it is not very powerful (typically if "+" is defined by induction on its right-hand-side it will be able to infer "n+0 = n" but not "0+n = n"). You can enrich your language with domain-specific solvers (as done in ATS, or in a generic way in the Coq modulo theories work).

Equality checking algorithm do not always evaluate terms all the way down: they can stop as soon as they know for sure that the spine is different (for example "Leaf (factorial 1000)" is different from "Node n1 n2" in a usual binary tree definition). You may still need to evaluate terms fully in the general case and, indeed, that can take a long time, but your approach of using domain-specific solvers as the same problems (for the same reason).

I'm not sure about why you worry about modularity, but I suppose it is simply a misunderstanding on my admittedly fuzzy use of "evaluation" in the context of open terms.

Modularity and type theory

I'm not sure about why you worry about modularity, but I suppose it is simply a misunderstanding on my admittedly fuzzy use of "evaluation" in the context of open terms.

For a function application f e to reduce, you need to know the definition of f. For example, 0 + x is definitionally equal to x only if the definition of + is visible.

Making the definition visible makes writing well-typed programs easier, since you do not need to put in an explicit equality coercion between definitionally equal terms. However, making definitions visible to clients is bad form from a software engineering point-of-view, since it makes the well typedness of a client program dependent on irrelevant details of a module's implementation.

For example, in this case of + we could have done an induction on the second argument, and changing to that should not alter the type-ability of clients.

IMO, this is a nasty tension when programming with dependent types. I expect the ultimate solution is to make conversion user-extensible, so that modules can export an abstract interface without losing the convenience of reduction in types. But given that this is where basically all the difficulty of the metatheory of dependent types lies, it is not an easy problem to solve.

Ad Researchem

Well, you approached a related problem in your work on term equalities for System F. Do you have more things planned in this direction, or has it simply melted into the general research agenda of "developping kick-ass logical relations"?

I do have some plans...

...but if I'm lucky, Conor will do it first and I won't have to. :)

Decidable?

I expect the ultimate solution is to make conversion user-extensible, so that modules can export an abstract interface without losing the convenience of reduction in types.

Do you imagine that these custom conversions will produce a more modular decidable equality? So the programmer would need to understand this equality well enough to know when an explicit isomorphism is required and when it isn't?

What advantages do you see that having over what I'm describing, where extensional equality semantics are everywhere, implicitly, and the IDE lets you know the spots where it couldn't be automatically verified (and also spots where it was verified to be wrong)? How are you going to enforce nice decidable rewrite rules?

It looks to me like your suggestion is to do what I'm doing, but restrict to decidable equality instead of general extensional equality. And then you note that good modular decidable equality for general domains is a hard research problem. My guess is that it won't ever be solved, but your guess is better than mine. Still, why not work with a system like I'm proposing and then take advancements in deciding equality as they come?

Propositional extensionality might help

If I might delurk for a moment, perhaps I can suggest some way to reduce the tension. Let us suppose that we work in a type theory whose definitional equality is a decidable (but closed under at least the "computation rules") underapproximation of the extensional equality with which we reason, but that its propositional equality is indeed extensional (thus ruling out the usual inductive definition of the propositional equality, of course).

A module signature might then define a "reference implementation" of a function, then demand that the structure deliver the "actual implementation" together with a proof that "actual" and "reference" coincide extensionally. An importing module could then compile against the signature alone, using open computation with the reference implementation during typechecking, but then link with the actual implementation for closed runtime computation.

I'm not saying it's a completely done deal (for where is this mythical type theory with canonical evaluation, decidable definitional equality and extensional propositional equality?). It would be nice to do the corresponding "reference implementation" trick for types, but that is likely to involve managing change-of-representation at module boundaries. But there has been useful progress in this direction.

A key result due to Nicolas Oury is effectively that a suitably extensional propositional equality allows a term which typechecks under a definition let x = s : S in ... to be reconstructed abstractly under \ x : S -> \ q : x == s -> ..., with explicit type transportation via q inserted where necessary. So we can work with a reference implementation, then abstract from its intensional details.

Propositional equality might suffice

My suggestion is to use propositions for everything. All reasoning about a symbol should be via a set of propositions known to be true about that symbol. Even the definition of a symbol should be interpreted as a set of solvable constraints. Including a 'reference definition' in your module should just be a matter of including more details. You might not want export full equality, either -- symbols could be defined as quotients.

Combined with this approach, I propose splitting current type systems into two orthogonal parts: the part that I still call types, which tracks true propositions, and the module part that restricts visibility.

The above system I've sketched out is very simple, but doesn't have built in decidable definitional equality. So my question is: what purpose does the decidable definitional equality serve? If I'm writing a program in your language, and I've constructed a value that's extensionally equal to the argument I need to pass (see my example from a couple of posts ago), but I have to manually swizzle it to right type, that's a deal killer. Maybe you're thinking that there would be an unsafe_magic_cast that I could use to bypass the type checker when I'm really in a hurry, but I still say that's the wrong default. IMO the right default is: Accept extensional equality everywhere, show me what's proven and what's not, and get out of my way.

(I'm building this, btw, so criticism would be very helpful)

An elaboration

Be precise about what you want. If I have a function f : (x : S) -> T and I want to apply it to some s : S', under which circumstances can I do so? What is your notion of evidence that S and S' are sufficiently compatible? The relation "S and S' are accepted as compatible by the computer" is your effective notion of "definitional equality", by whatever means you compute it: symbolic evaluation, theorem-proving, or whatever. Unless you're not going to typecheck applications, you will induce such a notion, and you have to face up to the reality that you can't decide extensional equality, so you must either impoverish your type compatibility relation or offer the user some means to work around it. Assuming that we're not choosing the former (and at least I'm not), the question then comes down to which circumstances demand "manual swizzling" and how to present it when the situation arises.

Let me address presentation first. My preference is to distinguish two language levels: a spartan kernel language in which terms carry all the evidence required to be readily typechecked by a small and clearly specified algorithm (so if you don't like my kernel checker, you can write your own), and a high level language supporting a rather more terse style whose programs are elaborated into the kernel by a tool which is preferably predictable in its behaviour although not necessarily trusted. That is, the high level language has judgments t |> t^ : T, "high level term t elaborates to kernel term t^ of (kernel) type T".

Dependently typed languages usually work this way, although by no means all provide a clear definition of both layers. It is important not to judge the design of the spartan kernel by your standards for human accessible code. This spartan language will need to record whatever equational explanations fall outside the kernel checker's accepted notion of type compatibility.

I expect that the kernel will provide an explicit swizzling operator: swizzle(S : Set, T : Set, Q : S == T, s : S) : T. This allows me to transport values between provably equal types. Of course, in my high level language, I should like to avoid explicit appeals to swizzling, but I am quite happy with *implicit* appeals to swizzling. I can elaborate application as follows:

f |> f^ : (x : S) -> T
s |> s^ : S'
Q^ : S' == S
------------------------------------------------
f^(swizzle(S',S,Q,s^)) : T[swizzle(S',S,Q,s^)/x]

leaving the kernel evidence Q^ to be discovered by whatever technology is available. There are no type errors, only equations which the computer is too stupid to see for itself. (Arguably, a type error should be signalled if the computer can clearly see that the equation is unprovable.) It is thus perfectly possible for the high level language to be mediated modulo proof discovery. I recommend it. Perhaps we are even thus far in agreement.

At least two questions then remain. Firstly, what's the type compatibility relation of the kernel language? Secondly, what's the mechanism by which the elaborator seeks the evidence for the swizzling it inserts. These are both "sliders" that it's fun to play with.

For the former, it's quite possible to choose a weak notion, such as alpha-equivalence or even syntactic identity as the kernel's type compatibility relation, but this comes at the cost of requiring explicit evidence for very boring equations. You can chuck in various mechanisms, terminating or not, more or less predictable. If you like, you can choose a mechanism which is not stable under evaluation. Providing a function with a reference implementation which is evaluated symbolically during kernel compatibility checking is one way to specify a decision procedure which is stable under evaluation and sound but usually not complete for the equational theory of the function, but it is not necessarily the whole story, nor does it prejudice the actual implementation used in compiled code, with all swizzling erased.

Whatever notion of compatibility you are willing to let the kernel checker test, you let the elaborator perform the same test and if it's positive, you can record the trivial piece of evidence that allows the swizzling to vanish by computation: swizzle(S,S',trivial, s) = s is type preserving just when the kernel admits that S and S' are compatible. But you can let the elaborator try harder, e.g. by checking if the desired equation is (an instance of) an available assumption. The latter design choice means that you need merely ensure that proofs of the equations you need are somewhere in scope, not right there at that point in your term. But you can go further, making the elaborator try unification, or whatever other theorem proving techniques you like. The elaborator need only complain about equations it's too stupid to solve.

Let me give some reasons why I want the kernel equality to include open evaluation. Firstly, I've gone to the trouble of ensuring that open evaluation terminates, so if I choose to provide a reference implementation, I can safely run it without compromising decidability. If I'm worried about it being too slow, I can decline to provide the reference implementation, keeping typechecking fast and stupid. Programmer's choice.

Secondly, I can teach the elaborator to extend its ability to find evidence for equations by reflection. Suppose I have a decision procedure for some equational theory on a type T, e.g. some fragment of arithmetic useful for lengths of vectors. That is, I have

* some datatype of open expressions indexed over contexts, L (G : Context) : Set,
* an evaluator eval(G : Context, e : L G, g : Environment G) : T,
* a test LEquiv(G : Context, e : L G, e' : L G) : Bool, and
* a proof LEquivSound(G : Context, e : L G, e' : L G, p : IsTrue(LEquiv(G, e, e')), g : Environment G) : eval(G, e, g) == eval(G, e', g)

then the elaborator can generate evidence that t == t' in T by quoting t and t' as e and e' in some environment g for context G (a structural traversal of their syntax), so that eval(G,e,g) = t and eval(G,e',g) = t'. The proof is then exactly

LEquivSound(G,e,e',trivial,g) : t == t'

which typechecks if LEquiv(G,e,e') evaluates to true, always provided kernel equality is willing to compute.

We can shift work from human to machine by using reflection to deliver evidence for propositional equality by heavy duty use of computation in kernel equality. Thus was the Four Colour Theorem proven!

So yes, make propositional equality extensional; yes, get equational type swizzling out of my face in high level programming; but no, don't give up the kernel's power to compute when testing type compatibility; and yes, allow reference implementations as cheap sound equational theories; but then, use reflection to export useful decision procedures from modules explaining how to solve constraints involving the functions provided.

All of these technologies exist in at least prototype form, but not all working together in the same system. What a pity.

Thanks

Conor, thanks for the detailed reply. I haven't done a very good job of communicating the interesting features of my project, which I should try to do, but for now let me say that I agree that most of your proposal above seems pretty reasonable. Here are the main issues/questions I have:

1. What are the semantics of the program while it's in a state of having a compiler too stupid and a programmer too lazy or stupid to fill in the swizzles? My position is essentially that dependent types should be relegated to tracking propositions about a program and should not influence evaluation. Answering the question of your first paragraph ("...under which circumstances can I do so?"): you can always do so and you'll get feedback telling you when you botched the core types (passed a list as an integer) and telling you where propositions could not be proven. The important property is that as long as all of the inferred propositions are true, your program will run correctly without them ever having been proven either by you or your tools.

2. What's the purpose of the kernel language? Is it to satisfy the de Bruijn criterion? It seems to me that for that purpose we want utmost simplicity so that we most easily verify that our proof checker is working and that our logic is sound. But if we're talking about a language that developers are going to drop down into during day-to-day work to check that they really understand what's going on, then a shallow translation from high level becomes a more important criterion, I think. And we can bootstrap from the former to the latter using reflection. The purpose of the latter is really just communication; a good implementation of the high level language would probably be proven correct and would skip generating any kernel code or proofs except when the programmer wants to inspect them.

3. Speaking of reflection, it's certainly cool the way that works in type theory, but do we really need that trick? We can just add an explicit 'reflect' node to a proof to cause evaluation, and it sounds like the surface syntax in the high level language would be more or less unchanged. Having it explicit probably even makes it easier to e.g. pre-compile the decision procedure.

4. Open evaluation means partial evaluation in the presence of free variables? What does one have to give up in order to have terminating open evaluation? I'm a little skeptical of the utility of this reference implementation idea because: a) it's only benefit from my point of view is to improve automation of swizzling, which is useful, but, b) it's only useful when the specification you have in mind for a function is "exactly the same as this other function", and c) that means it will most apply to little helper functions that are probably most amenable to automation by simple rewrite rules.

Also, this paragraph was mildly humorous:

Firstly, I've gone to the trouble of ensuring that open evaluation terminates, so if I choose to provide a reference implementation, I can safely run it without compromising decidability. If I'm worried about it being too slow, I can decline to provide the reference implementation, keeping typechecking fast and stupid. Programmer's choice.

So you've proven it terminates, and if it doesn't the programmer can always choose not to run it. :) This ties in to another one of my opinions, which is that general recursion should be allowed in any function. That should introduce a new proposition to check, but it should be ignorable like any other proposition. Running such unchecked functions at compile time should pose no problem -- if you want to crash the compiler, you can also do that with the Ackermann function.

"Reference implementation"

I like the idea of a reference implementation. When you want to say something about a complicated functional program, just saying "it is equivalent to this simpler program" is often a nice way to specify it's behavior. It comes from the fact that a large part of the behavior we want to specify is precise enough to capture a (naive) implementation.

(We often recognize this situation in a negative way, when the specifications are complex enough to look like an actual implementation rather than a specification. This often happens when the problem is shallow, and in this case it means we haven't added much confidence. It's more useful the implementation we're trying to check correct is interesting for its intentional aspects such as performance -- at the cost of more complexity wrt. the native implementation.)

I distinctly remember a paper about Adam Chlipala using the same idea of "equivalence with a naive,pure implementation" as an efficient specification device. Probably part of either the Ynot or Bedrock effort, but I can't find the paper back.

Your idea that this style of specification also allows for convenient use outside module boundaries, respecting some form of abstraction, is a very nice addition.

Lack of a type system?

my main problem with dynamic languages is not the lack of a type system.

I know you said that this is tangential to your main point, but I don't understand what idea it is that you're equating with "lack of a type system." Could you please explain what you mean here?

With the exception of assembly languages and one very pure/primitive FORTH dialect that knew only about bits until you defined a word for defining types, I don't think I've ever seen a programming language, dynamic or static, that was entirely lacking a type system.

Also, I don't see how it's a dynamic/static distinction; you can have dynamic languages with strong type systems and static languages with no type system at all (the aforementioned assembly language on a machine with separate memory pages for code and data counts as static).

The only real static/dynamic distinction I can think of regarding type systems is that defining new types is not generally possible during runtime in a static languages. Somewhat less real (cultural, perhaps) is that it is traditional in dynamic languages for values to have types (thus, typetags are part of the runtime representation of values) and traditional in static languages for variables to have types (thus typetags are part of the compile-time symbol table). But there are certainly exceptions to both traditions(union types in static languages and typed variables in dynamic languages), and I think the exceptions are becoming more popular recently.

By type system, I meant

By type system, I meant static type system. I think this is common usage in circles closer to mathematical type theory, but sorry for the confusion -- I should have worded more carefully.

It doesn't really make sense to talk about defining new types at run-time in static languages, unless you have several stages, in which case it is possible.

Okay. But it's a bit of a regress....

What attributes distinguish a "static" type system from other type systems used in programming languages? Is it the distinction between values having types versus variables having types, or is it the ability to prove before starting execution that you won't ever have to do a runtime type check, or is it something else?

I ask because I can think of exceptions to both of those arising in dynamic languages (and, indeed, I can think of instances of both of those occurring in languages widely considered static).

Is there a clear formal criterion like the distinction between LL1 and LL2 grammars, or is it just a general perception or style whose definition is sort of cultural/vague, like "Object-oriented?"

Ray

Static and dynamic

A common way to think about this is in term of "static" and "dynamic" semantics. (Static) types are about what properties of a program we can predict without actually running it, while dynamic semantics describe how to run it. You have to check that they agree, eg. that static predictions are actually always true.

Formally, all programming languages have both static and dynamics semantics, but sometimes the static semantics of the "official definition" are so simple as to be trivial (or at least only check variable scoping). This explain the vitriolic claim that "dynamic" languages are unityped. It's fine because dynamic semantics are also a very interesting topic that can give full-time work to language designers.

Is there a clear formal criterion [..] or is it just a general perception?

If you formulate the question in a formal setting, there is a clear formal answer to it. I have the feeling there is a strong consensus on the topic, at least in the research community -- this is not the case on the "object oriented" definition for example.

Bruce Findler has done interesting work that tackles this question formally, such as 2007 "Operational Semantics for Multi-Language Programs" with Jacob Matthews, describing how to mix statically-typed Java and dynamically-typed Scheme code in a single program. It's very neat.

The distinction between

The distinction between static and dynamic is rather simple. If you are staring at the code without regard to a specific execution context, trying to reason about the code's general properties, that is static. If you are looking at the code executing in the debugger, that is dynamic.

Static types is a documented conservative approximation of what values could be bound to the typed symbol. You can use that for static type checking if you want, or just to reason about the code in general without a specific execution context. The dynamic type is actual type at run-time, often used for dynamic type checking but may also intrude on execution semantics.

Static types are not just

Static types are not just about straight-jackets! Static types are important enablers of static analysis because they document, abstract, and encapsulate the shape of computations. Otherwise, you have to reverse engineer everything from code, which is a much harder problem then having the programmer do it themselves! This analysis isn't just about saying no, it might be able providing enough context to do something like code completion.

Now, static type systems are not very expressive, and tend to become not very useful as the analysis becomes more sophisticated. Dependent type systems try to become more sophisticated, but their exacting natures means theorem proving.

Running the compiler at runtime...

Multi-Stage Programming allows this in a type-safe manner.

You do care about only a

You do care about only a single trace when debugging. The idea is to get the program working, not get it working perfectly. When you change a line of code, you want enough of the program to rerun to get an idea of how the change affects the result. In a dynamic or static language, you can trace dynamic dependencies on the initial run, which are going to be way more accurate than anything you could compute conservatively doing a static analysis! (SuperGlue does that) Perhaps we can replace the accurate dynamic dependency trace with a static trace to improve efficiency by a constant factor, but...I don't see that being a game changer (constant factor...).

We can also structure the language to support stronger forms of state/context encapsulation that make refresh more efficient. Again, using the static type system is convenient in this case, but not really necessary (you can encapsulate in a dynamic language by preventing values from being accessed dynamically).

Actually, rerunning the program is still often faster than the time it takes to run a sophisticated static analysis that can give you more feedback than the "types are write" (unless we are talking about a sophisticated dependent types, of course).

Personally, I think static typing has many benefits and I don't see myself designing a purely dynamic language anytime soon. But pragmatically, static languages only provide some small percent advantage over dynamic languages (my main example being code completion), especially for those use cases where you want to write and iterate on a program quickly and are not into too much up front thought and design, cases where live programming is going to be especially useful.