The Next Stage of Staging

The Next Stage of Staging, by Jun Inoue, Oleg Kiselyov, Yukiyoshi Kameyama:

This position paper argues for type-level metaprogramming, wherein types and type declarations are generated in addition to program terms. Term-level metaprogramming, which allows manipulating expressions only, has been extensively studied in the form of staging, which ensures static type safety with a clean semantics with hygiene (lexical scoping). However, the corresponding development is absent for type manipulation. We propose extensions to staging to cover ML-style module generation and show the possibilities they open up for type specialization and overhead-free parametrization of data types equipped with operations. We outline the challenges our proposed extensions pose for semantics and type safety, hence offering a starting point for a long-term program in the next stage of staging research. The key observation is that type declarations do not obey scoping rules as variables do, and that in metaprogramming, types are naturally prone to escaping the lexical environment in which they were declared. This sets next-stage staging apart from dependent types, whose benefits and implementation mechanisms overlap with our proposal, but which does not deal with type-declaration generation. Furthermore, it leads to an interesting connection between staging and the logic of definitions, adding to the study’s theoretical significance.

A position paper describing the next logical progression of staging to metaprogramming over types. Now with the true first-class modules of 1ML, perhaps there's a clearer way forward.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Is this the way forward?

Do we really want to go to all the trouble of the techniques in the paper to manually implement inlining of functor applications? Given that the need for this kind of optimization is pervasive if you liberally use module abstraction, this approach seems way too verbose. An 'inline' keyword or simply guarantees about when inlining will occur seem more reasonable. Are there other better examples of how this technology would be useful?

MetaML as an "asm of staging"

I don't think the goal is to propose people actually use constructs like these directly in this form. Rather, the goal is to explore the issues and solutions with staging. Once staging is better understood, we can abstract over it with pretty syntax and move towards a more tacit approach. If you look at MetaML, it has quote/unquote/run primitives, but then compare it to Lightweight Modular Staging, which dramatically alter the programmer experience to not directly fiddle with those primitives.

MetaML is an asm of staging,

MetaML is an asm of staging, but it is proposed for actual usage. There is a well-known more tacit approach—partial evaluation—but we also know well that it doesn't work transparently; you need to fiddle with your program so that it partially evaluates well, so overall we have mostly moved away from partial evaluation to staging.

With LMS, you still have to write some staging annotations (Rep), and understanding the staging behavior of the code requires inferring the annotations anyway, so it's not clear that it's always an improvement — though it can be more convenient to write.
This seems one of the many cases where you want to write one language and read another — if only projectional editing were already practical...

I interpret LMS's use of types as a way to reuse type inference for binding-time analysis. (BTW, I'd be curious how this compares to research on binding-time analysis through type inference).

partial evaluation—but we

partial evaluation—but we also know well that it doesn't work transparently

Having the ability to specify 'inline' at a functor application site seems like an easy way to direct the partial evaluation.

if only projectional editing were already practical...

I don't think that's an ideal solution, either, since it combines the staging plan with the base semantics. What if you want you want to stage a piece of code multiple ways?

Staging doesn't work for me.

Personally I find staging to be the wrong approach. I think programs should be compiled and verified before deployment. I think that for dynamic polymorphism there are simpler and more direct answers. What do people consider a good example that requires staging?

Could you elaborate? Here

Could you elaborate? Here are some partial (and somewhat obvious) answers, with the caveat of "I'm not sure what you mean". It seems that either you think of something else for staging, or you have objections against each of them, which I'd like to hear.

I think programs should be compiled and verified before deployment.

Why does that conflict with staging? I suspect you're thinking of MetaML's run, but you can have staging and never use that: you generate a Code U> and turn it into a program that you ship. In a sense, "staging" is about "separating stages", while "run" is about "mixing them back" by interleaving them (while it's usually also considered as part of staging, because in some applications it does make sense).

I'm not sure what you mean by "verified". Are you afraid of compiler errors at runtime? There are quite a few staging systems that make them impossible, though there might be an open conflict between absolute guarantees (like given by MetaML) and being able to transform staged programs (as in LMS). I think we know how to solve both with dependently typed metaprogramming, but we can argue whether that's practical.

I think that for dynamic polymorphism there are simpler and more direct answers.

What's "dynamic polymorphism"? Again, for many applications of staging, "dynamic" sounds odd, since you can often collapse everything into the last stage.

What do people consider a good example that requires staging?

You might be familiar with C++ template metaprogramming. Staging is a designed approach to do at least the same (and more), where you use the same programming language at both levels. Moreover, in some approaches (like LMS), you can add domain-specific optimizations for high-performance computing.

A recent spectacular application of staging is building database systems and query compilers by staging:
http://infoscience.epfl.ch/record/198693

Otherwise, the naive answer is to recommend a literature search, starting for instance from http://tiarkrompf.github.io/ or Oleg Kiselyov, for instance the paper on the Shanon Challenge (http://okmij.org/ftp/meta-programming/HPC.html#shonan-challenge). But I can't really presume that you're not aware of work by one of your coauthors, hence my suspicion you have objections.

I prefer two levels with different languages.

I mean runtime-polymorphism, or dynamic/runtime dispatch (IE virtual methods etc.). I have read Oleg's work on the Shonan Challenge. I should probably clarify my understanding of staging.

You have program1 which when run outputs program2, which when run outputs program3 etc. We could say interpreted programs (without 'eval') have 1 stage, and the above example has multiple stages.

I think exactly 2 stages is optimal. I think statically checking properties at compile time (the type-system is a program which gets evaluated at compile time, and we only output the compiled program if type-checking succeeds) is good. I think parametric generics are good (where the compiler outputs a specialised multiplayer for N*M matrices for example). But I think more than 2 stages obfuscates the purpose of code and makes it harder to understand not easier.

I dislike the merging of type-level and term-level into the same language, as it leads to chains of programs of indefinite length. I prefer a logic-language for the meta-level as you need to be less concerned with the 'how' something works, and a functional/imperative language for the value-level, and to stick to exactly two levels.

availability of information

For many applications, there is some information that is, fundamentally, not available until runtime. With a type/term split, you have to decide which subsets of information are available at which times. If some information is unavailable at runtime (ie. "dynamic"), then it's a requirement that you program it at the term level. Otherwise, you (and many others) would prefer to hoist it to the type level, so as to maximize verification.

The problem comes in when you have data that is *usually* static, but *sometimes* dynamic. For example a tuning parameter for a database. Suddenly, you're faced with some options:

1) make it static, force users to recompile your app to configure it
2) make it dynamic, accept the performance loss in your static runtime of choice
3) make it dynamic, cross your fingers and hope that your JIT inlines it
4) utilize staging (or similar) constructs to dynamically demand that it become static at a particular time

Personally, I think the opportunities afforded by option #4 are super cool and well worth studying.

The bigger problem comes in when you decide on either static or dynamic and later change your mind. Whoops! Need to rewrite it from type to term language, or vice versa.

edit: I want to add that I must have databases on the brain thanks to the research linked above about query engines. In those cases, you want to specialize the query evaluator to the query itself. And, of course, the query isn't available until runtime!

Verification after leaving the factory is useless

I want static verification because the developers can do something about it. Especially if the static verification is of the proof kind, IE prove that this sort function always returns correctly sorted output. This is useful for the programmer to be aware of errors before shipping.

We then compile and ship the verified code.

Any static errors after that are useless, the user won't know what to do with them. You need proper validation checks and responses in the code itself. This can be enforced by the static checks, using refinement types. So for example 'if x < max then ...' refines the type to be (int < max) we can statically ensure the appropriate dynamic checks are in place before shipping code. The developer can then take user-friendly corrective actions if the dynamic test fails.

I am also not keen on making the compiler a dependency of shipped code.

I am also not keen on making

I am also not keen on making the compiler a dependency of shipped code.

A full compiler isn't necessarily needed.

Elaboration

Isn't that like Ada elaboration for generics. I don't necessarily think that is a bad idea. But to stage the full language would require the full compiler (except the parser).

The paper I linked supports

The paper I linked supports dynamically generating code for a meaningful subset of the Cyclone language, in a type safe way. The runtime support required is just memcpy to stitch together the template fragments IIRC. It's quite a bit beyond Ada's generics. You could consider this a compiler in some sense, but not in the sense that you're using the word, ie. no optimization infrastructure. All of that optimization is done to the template fragments themselves at compile-time.

Elaborating on Elaboration.

Stitching together pre-compiled generic blocks sounds exactly like elaboration. Maybe this is more sophisticated than Ada's elaboration, or maybe you underestimate what can be done with Ada generics?

Performance is going to be limited by the lack of specialisation, but this does not sound as bad as including a dependency on the full compiler. It sounds interesting to look at what the limitations are compared to other staging proposals. My feeling is this approach could work with what I am doing, and is probably as far down the staging road as I would want to go.

Maybe this is more

Maybe this is more sophisticated than Ada's elaboration, or maybe you underestimate what can be done with Ada generics?

If I were to compare it to something, the technique is more like selective inlining (effectively, code-copying JIT). Ada generics are much less expressive by comparison, although perhaps the techniques they use to implement Ada generics are similar.

Performance is going to be limited by the lack of specialisation

I don't doubt there will be some performance degradation. I skimmed the paper again, and they make some effort at optimizing the template fragments so that assembled programs are efficient.

Good point

about verification being useless in generated code.

As for "making the compiler a dependency of shipped code" - in the real world, outside of academic mumblings, every system is moving to having a just in time compiler, which means that the compiler is always available at runtime.

Though they don't handle low level concurrency, trace compilers are getting popular which means that untyped code can run efficiently by being specialized for the types seen at run time.

So not only is static checking not all that useful for a second stage, one can choose a language where that isn't even an issue without much cost.

Correctness

There is also the point that it is just hard to think about the correctness of code generated by other code (debugging C++ templates is hard enough) without even thinking about code generated by code generated by code generated by code etc...

Staging is easier than templates

MetaML staging annotations can't change correctness at all (IIRC), and LMS ones typically don't.
C++ templates are a horrible form of staging (if they qualify at all), since moving code among stages amounts to rewriting it. I'm sure you must know that, so I find your comment surprising.

(OTOH, I think your points against staging are interesting).

Better Staging

I'm certainly interested in staging that addresses my concerns. What would be nice would be clear example where the staged program is simpler and more understandable than an unstaged program to achieve the same thing.

I still think a logic language is better for meta programming because if I care about an algorithm I want to implement it at runtime. When I am meta-programming I want to focus on the 'what' not the 'how'.

The way I am approaching it you code generic-components in a fast compiled functional/imperative language, and then the logic language connects and configures the generic components. My original though was this happens before final optimisation and linking. I think I can accommodate something like staging by allowing the generic-components to be run-time loaded, connected and configured by the logic program, and the runtime would only be expanded by the size of the tiny logic interpreter.

already verified & just ship the compiler

I didn't mention verification, only specialization. Part of the research being done on staging is about how to write generic code that can be parameterized by stage. You verify code before specializing and, importantly, preserving the validity of those verifications after specialization! The premise is that you type check for a `int code` instead of an `int` and then when you specialize it later, you can be confident that it will be correct.

As for taking a dependency on the compiler: Who cares? A great deal of software that you use on a daily basis already depends on evaluators being pre-installed: Scripting languages like Python or Ruby need their interpreters or even sophisticated just-in-time compilers like those for Java and JavaScript. But even if you have to ship the compiler with your application, the total overhead can reasonably be measured in single-digit megabytes.

Further, Staging can simplify your primary evaluator too! if you design a language/compiler/vm with something like LMS & https://github.com/tiarkrompf/lancet from the ground up: You can probably cram a sophisticated JIT in to a fraction of the code. Hell, you can specialize your compiler to your program and cram the the compiler in to the runtime for the cost of practically no space at all. Add to that previously completed verification, and there's low security risk too.

Staging seems like a no brainer to me...

Lancet looks interesting

does lancet do much that's in the paper in the article?

lancet built on LMS

Lancet is built on Lightweight Modular Staging (LMS). These practitioners and researchers collaborate... This is all very new stuff.

Performance

Call me when JavaScript is faster then C/C++

Actually there is a serious point here, could a JIT that generated non-garbage collected, unboxed code out-perform compiled code?

Critical factors for me are: performance, memory-use, and code readability.

fallacious challenge

Java, JavaScript, etc had to embrace and explore JIT technology due to designs which make 100% static compilation impractical or otherwise unattractive.

If you design a language with a sliding scale of static/dynamic compilation in mind, it's certainly not going to look anything like JavaScript, and it's quite likely that it can be faster than C++.

However, this stuff hasn't been well studied yet... which is one of many reasons why I support work on staging.

changing subject

Your point above talked about JITs being a use for staging. My question was about JITs. Can a JIT compiled language be faster than a statically compiled one? I asked because I am interested in opinions on the subject. Maybe Java or C# would have been better examples than JavaScript?

The justification for staging seems to be performance based, as you can write a flat program to perform the same task that I think will be more readable.

When working on HList for Haskell it was obvious there was a problem with lifting a value to the type level (and therefore dependent types). There appeared to be two solutions, staging, ie defer compilation of all code whose type could not be determined until runtime and we have the value (and in a recursive program this cycle can go on indefinitely), or not lift the value to a type at all, but use path dependent types. This enables the same functionality to be achieved completely statically.

Ill-defined question

Can a JIT compiled language be faster than a statically compiled one?

I think this question is ill-defined.

Taken literally, the answer is yes: some good JITs (eg. LuaJIT) will consistently outperform bad compilers (eg. Python bytecode compilation followed by a braindead translation of each opcode to assembly) on typical benchmarks.

You can even make the answer true with the same language in input to both implementations: Facebook's PHP JIT that targets their HHVM machine reportedly outperforms their previous purely-static compiler (HipHop).

Maybe you want to ask if *some* JIT implementations can beat *all* purely-static compilers for a fixed language you have in mind, or a known class of "fast" language. Could we implement a JIT for C++ that outperforms statically-compiled C++? Could we implement a JIT for some language that outperforms C++?

At least there is evidence that Profile-Guided Optimization (PGO), which is the closest thing to JIT that current compilers support, does bring noticeable performance improvements over the usual compilation pass. See for example this, this, or that for above-10% improvements across systems and compilers in the Firefox community, or this example of an around-10% win when compiling the Python interpreter with PGO enabled.

From these results I *conjecture* that profile-guided optimization (whether ahead-of-time or just-in-time) can reliably bring performance gains to C++ programs.

Profile Guided Optimization

I already use PGO because it works well and gives a noticeable performance improvement (20-50% in my use case).

However PGO works well because the instrumentation and compiler are not distributed with the finished code.

The problem is the instrumentation slows the code down, so I don't want PGO instrumentation in production code. On that basis, I don't think its quite that easy to claim that JIT-PGO will be a win.

JIT Instrumentation slows the code down [citation needed]

The compiled code does not contain instrumentation any more, so data is needed. If JIT authors could fit execution of instrumentation code during memory-induced stalls, the performance overhead will not be just insignificant — it would be exactly zero across many scenarios.

Back in the days, my JIT course recommended reading "A Survey of Adaptive Optimization in Virtual Machines" (http://mips.complang.tuwien.ac.at/andi/vm-survey.pdf), with a discussion of the tradeoffs. At a quick glance, at the time (2005), profiling overhead was a challenge, but actual VMs did keep it low, at the expense of the quality/quantity of gathered information.

More generally, armchair theorizing works only for stuff with elegant models — models one can reason about in his/her armchair. Arguably, building things with such models (that is, elegant languages and theories) is a goal of PL theorists, but not so much of microprocessor designers ;-).

Can a JIT compiled language

Can a JIT compiled language be faster than a statically compiled one?

If we're talking about static source without any specialization for dynamically discovered data, a static compiler should beat a JIT of equal quality, simply because the static compiler isn't in a hurry; it can toss in a lot of extra optimization work.

Where JITs (or runtime staging) have a potential advantage is working with dynamically constructed (yet reasonably stable) loopy functions. This isn't uncommon in certain domains (e.g. rendering of user-constructed images or music, robotics systems where you develop then execute plans but occasionally replan, machine learning to auto-calibrate processing functions for available sensory devices and environmental conditions) or for certain ways of writing software (e.g. dependency injection, FRP with some behaviors changing on rare events, systems that make abundant use of configuration files, writing interpreters, plugins where plugin code is implemented against an abstract monadic typeclass and could be better specialized to a concrete type).

There have been many times where I have a Haskell system with hundreds or thousands of composed functions each curried with a little runtime data, where performance is barely adequate but could be blazing fast if a compiler could specialize and optimize the whole composite.

Really, I'd be happy to just have a `specialize` function that I can apply explicitly rather than ad-hoc tracing JIT. I be happier if this function behaves reliably and allows me to tune the optimization effort.

Occasionally, I encounter cases where user-controlled static staging could help. But that's been less common for me, probably due to the kinds of projects I've worked on.

You're looking for what Lancet provides

Really, I'd be happy to just have a `specialize` function that I can apply explicitly, rather than ad-hoc tracing JIT.

I'm not sure Rompf's Project Lancet (https://github.com/tiarkrompf/lancet) received the needed engineering effort to be practical, but it seems exactly what you're asking for. What do you think?

Lancet

Lancet looks neat. As you say, it needs engineering work to make it robust and practical. If it were for Haskell / GHC, I'd give it a try. Based on impressions from my Twitter acquaintances, I believe I'll remain happier if I avoid Scala.

Paper

I'm more curious of your impression of the paper — is the interface enough for what you need? Maybe somebody can get Lambdachine and Hermit to join forces :-). In fairness, address your scenario was IIRC a design goal for Lambdachine (https://github.com/nominolo/lambdachine), so maybe that's what you want to try.

Based on impressions from my Twitter acquaintances, I believe I'll remain happier if I avoid Scala.

ROTFL man, that made for a great quote (which I did: https://twitter.com/Blaisorblade/status/584031758804508672). I also wish for a better Scala-like language, so I totally see your point.

403 Forbidden Lancet

I'm getting a 403 (Forbidden) on the Lancet links. I wonder if it was recently blocked or taken down. ACM has a copy for $15, but I'm hardly willing to pay just to peruse details of a project I won't use.

Best I can find is a detailed blog article and the more generic paper on LMS.

Mail the authors

You should just send an email to the authors, saying you are interested in their work, asking if they could put it back online. I would guess more likely (than a block or takedown) that the URL changed.

(It's not in your interest to pay for a copy, because it helps make this exploitation system profitable.)

You can find those slides online.

The author moved — here's Green Open Access

gasche's guess seems right: The PDF was hosted at EPFL, but the author moved elsewhere, and the old homepage disappeared altogether.

As others have remarked, green open access doesn't fully work, because most of us academics aren't disciplined enough (we weren't hired as librarians, after all).

Meh

As others have remarked, green open access doesn't fully work, because most of us academics aren't disciplined enough (we weren't hired as librarians, after all).

I think this is an observation that goes in the wrong direction, and that we would be better off by agreeing as a community that it's our responsibility to make our articles accessible online (whether by actually being reactive in hosting them online, or delegating to arxiv).

I mean, at least where I work, researchers are tasked with plenty of things they were not hired for (many of administrative nature), and putting PDFs online in a place search engines can find them is among the less annoying and time-consuming. On the other hand, the current state of affairs that supposedly let professionals handle this aspect is, to many, deeply insatisfactory.

I would be glad to email Tiark Rompf once in a while to get his PDFs back online if the undergraduate students of the future don't have to jump through hoops to access research articles (I had to, this is not an imaginary concern). This may "not fully work", but I think that remark misses the point.

What do people consider a

What do people consider a good example that requires staging?

Consider in which language you'd write MATLAB, where users can enter algorithms dynamically for which you'd want to generate highly specialized code at runtime. I don't see how it's possible to express and statically check such a program without general staging.

Type Checking

Surely you just construct a syntax tree for the language, and write a type-checking / inference algorithm. So a language that makes it easy to express abstract syntax and type checking/inference algorithms seems useful.

The bit is seems hard to do without is the code-generator, otherwise you are limited to an interpreter, or writing a compiler back-end which is a big task.

Once you are implementing a

Once you are implementing a language, I'm not sure how useful staging is. A DSL kit like spoofax might be more appropriate if low effort is desired.

I think it's pretty well

I think it's pretty well established that staging provides significant utility on this matter. For one, you exploit the host language's code generation capabilities to provide a JIT for the hosted language, all of which is type safe.

I see the usefulness, I'm

I see the usefulness, I'm just not sure if that is a very big enough market as it is stuck between low and high effort.

A market that doesn't exist

A market that doesn't exist in a viable form is hard to predict. LLVM sees quite a bit of use, and staged languages could displace LLVM in some domains due to the extra safety.

People who are using LLVM

People who are using LLVM are pretty committed and are probably writing a full stack of language components anyways; they can afford specific solutions. I was thinking more the DSL crowd, but they have their tools also.

Honestly, I could probably get some benefit from it, already being a heavy user of dynamic code generation via the DLR (part of the CLR). The dlr will check your tree types before compiling them, and if they aren't well typed...well, the error messages ain't pretty. But even then, it is quite easy to abstract and modularize code generators.

Incompatible type system.

If the DSL has a completely different type system (say intersection types) that cannot be embedded in the host language type system, then you would need to implement the type inference/checking anyway. Staging only provides an advantage where the DSL has a type-system that is a subset of the host language type-system.

You need a good EDSL host

Staging enables a low effort backend for a (deeply?) embedded DSL — whether you can embed your language is an orthogonal question. The market for embedded DSLs seems very much alive in practice *within suitable languages*, and research is evolving both guest and host languages. Dependently-typed languages would remove the restriction you mention; sooner or later, we'll learn how to deal with the problems they introduce.

Module inlining

Can someone provide an example use of staging that can't be handled by module inlining? I'm thinking the semantics would be that you keep rewriting until all references to symbols in the module are eliminated. Here's an example that shows you need constant folding and dead branch elimination for this to work:

foo = struct {
    power n x = 
        if n=0 then 1
        else x * power (n-1) x
}

import foo inline
power0 = \x. foo.power 0 x
  --   = \x. if 0=0 then 1 else 1 * foo.power -1 x
  --  Divergence if we don't eliminate the dead branch

It hazily seems to me that all uses of staging might be factored this way into a module of stuff that's to be eliminated at one stage and then the rest of it. Even more hazily, it seems like this might be able to support more than 2 stages, by allowing inlining of an module in an abstract setting that will itself later be inlined. Failure of the expansion to terminate under constant folding and dead branch elimination would correspond to failure of the staged program to terminate.

Thoughts?

Partial evaluation

You are willing to add additional specializing optimisations after module inlining. In effect you are doing (a restricted form of) partial evaluation, which is known to be more expressive than staging, but also much much harder to control, reason about, and get reliable performance improvements out of. Staging makes the specialization frontier explicit to the programmer, and that is the point.

A restricted form of partial evaluation

but also much much harder to control, reason about, and get reliable performance improvements out of

Well, the conjecture I was making is that this restricted form of partial evaluation is as easy as staging to control and get reliable performance improvements out of. In fact, in cases where your goal is 'abstraction without guilt' it would seem to be easier to control.

I agree with you, though, that the reliance on certain rewriting semantics makes it seem less nice than staging. It means that evaluation in the early stage language is some term-rewriting process instead of just ordinary evaluation. This also comes up with dependent types.

It actually seems that staging might fit in well with the representation selection mechanism I'm building: a staged type could be chosen to represent a plain curried type.

But I still don't understand the subject paper as well as I'd like. The section on splicing binders sets off alarm bells for me ... that looks like a problem that should be avoided rather than solved. I'd like to say that this supports the point I was making with Andreas that names and binding semantics shouldn't be in the same mechanism with modules, but I don't have the details worked out.

Anyway, it's interesting stuff. Thanks for posting, Sandro.

It actually seems that

It actually seems that staging might fit in well with the representation selection mechanism I'm building: a staged type could be chosen to represent a plain curried type.

If I'm understanding you right, this is close to the main insight of a previous paper on staging, which sought to clarify the issues surrounding term-level binding and staging by mapping staged code to closures.

I'm confused

I looked at this again, and I can't figure something out. If we have this definition:

module type EQ’ = sig type t val eq : t code → t code → bool code end
module MakeSetGen(Elt:EQ’ code) : (SET with type elt = Elt.t) code = ...  

Shouldn't this be a type error:

Elt.eq <x> <h>

The paper seems to explain that we can pull eq out of Elt by treating the code of records as a record of codes. Doesn't that mean that Elt.eq is a code? Why can we treat it as a function?

More fundamentally, I don't understand why we can't get the desired optimization without codes of modules. Why doesn't the following generate the optimal code?

module type EQ’ = sig type t val eq : t code -> t code -> bool code end
module MakeSetGen(Elt:EQ’) : (SET with type elt = Elt.t) code =
< struct
    type elt = Elt.t
    type set = elt list
    let rec member : elt -> set -> bool = fun x -> function
    | [] -> false
    | h:: t -> ~(Elt.eq <x> <h> ) | | member x t endi
module IntSet = !. MakeSetGen(
struct type t = int let eq x y = <(~x: int ) = (~y: int )> end )

Here, I've dropped the 'code' annotation from the type of Elt. By the time the code is generated for IntSet, the compiler should know the concrete type of Elt.t to be int.

As a simpler example, what happens here?

idGen: ('t -> 't) code
idGen = < fun x -> x >

id: int -> int  = !. idGen

Doesn't that generate a specialized version of id that's optimized to work on ints? Or would the generated code close over a type descriptor somehow?

Unimportant aside if one of the authors is lurking: the unlabeled code snippet following Figure 2 has some closing angle brackets left in it that probably should have been removed.

Here, I've dropped the

I haven't read the paper yet, but I hope I can still help.

Here, I've dropped the 'code' annotation from the type of Elt. By the time the code is generated for IntSet, the compiler should know the concrete type of Elt.t to be int.

I'd bet that MLton would indeed do what you want, at the price of having *no* separate compilation, and probably code explosion in some cases. Same for C++ templates, and there I'm sure on code explosion. EDIT: Matt is right in his comment below; MLton would (probably) produce the desired answer from a version of the snippet without staging annotations. For the actual question, see my other answer.

In general, the problem with automatic inlining is (for the compiler) deciding when to use it without having code explosion — solutions tend to be fragile heuristics. One reason staging is interesting is that fully automated solutions are never fully satisfactory.

I'm less sure why inline annotation aren't always enough, but recursive functions are an obvious problem (as we've seen earlier) — GHC just never inlines recursive functions, while staging gives you enough control for however much inlining you want. Moreover, with staging you can even inline a function in some call sites, but not in other.

The paper seems to explain that we can pull eq out of Elt by treating the code of records as a record of codes. Doesn't that mean that Elt.eq is a code? Why can we treat it as a function?

You seem to refer to:

while the comparison function Elt.eq is modified to map code values to code values, just as power from section 2 was modified into power gen

I bet "is modified" omits "by hand by the programmer", since that's what happened with power_gen (why passive, why?). Types seem to be handled differently, maybe because of phase separation?
EDIT: I retract, Sec. 3 states they can't do the usual staging thing, so I'll have to take a closer look instead of giving the usual staging answers.

I'd bet that MLton would

I'd bet that MLton would indeed do what you want

Well, note that I didn't drop all of the staging annotations. Just the ones from the module. The eq function is still explicitly staged, so you'd need to use some dialect of ML with staging support.

Better answers

1) On the typing complaints, the paper says:

The reader may have noticed typing issues in fig. 2. [...] We will discuss the implications of this issue in more detail in section 4.2

and then discusses both the typing issues, and the choice between codes of functions and functions of codes (briefly).
The problem is exactly that we often have the code of a record of functions, but we need somehow to access pieces of it. We don't just need to treat it as a record of codes, but as a record of functions of codes. Both steps are nontrivial, and somewhat discussed in 4.2 — but the authors aren't sure of how exactly to convert automatically between the two.

2) Compile-time inlining wouldn't work with first-class modules, that the paper seems to support (it mentions briefly Ocaml's first-class modules). The paper seems to address, for instance, the scenario where a first-stage program decides (at its runtime) how to assemble first-class modules into a second-stage program.

Your example where Elt:EQ’ could work as given. But you could apply MakeSetGen to a module Elt fetched from a data structure — in that case, nowadays splicing wouldn't work because Elt.t is not stored at runtime, unless you switch to codes of modules.

Let me extend your "simpler example".

idGen: ('t -> 't) code
idGen =  x >

id: int -> int  = !. idGen

idCode: (int -> int) code  = idGen
id1: int -> int = !. idCode

So, do we first compile the code to apply it to int? This would do type instantiation as usual, leaving any specialization to the goodwill of the compiler.

Or do we first create idCode, by applying idGen to <int>, and specializing it, and then turn the idCode into actual code?
If we use explicit polymorphism (a la System F, even a predicative version), the distinction again the one between code of functions (forall {t}. t -> t) code, as in the first case (where I use braces {} around implicit arguments, Agda-style), and functions of codes, forall {t}. {t code} -> t code -> t code.

Thanks

1) Yeah, I understood that part, but was being careless. It looks right.

2)

So, do we first compile the code to apply it to int?

I don't think this is possible. Types are always resolved before run-time and the compilation (via !.) happens at run-time. So the int parameter should be available to the compiler. I think the issue is how it's available.

Look at what the compiler generates for idGen. It will be a closure over a type descriptor that generates a piece of code, and the code will refer to the closed over type descriptor. But when you invoke that function at a given type descriptor, how is the returned code stored? I'm guessing the code itself is stored as a closure, and so the type descriptor sits around. Then, when the compiler compiles the code, it could itself just reference that type descriptor. So I'm guessing that's how types don't get inlined and is the problem we're trying to solve with this modules mechanism.

Nevermind

No, actually regarding point 1, I think I was careless in deciding I was careless :). The paper still looks wrong to me. Instead of:

~(Elt.eq <x> <h>)

Why isn't it:

~((~Elt.eq) <x> <h>)

I think I understand the point of treating a code of module as a module of codes. But doesn't that mean that the fields should be code? The outer splice here is the one that comes from the return type of eq -- bool code. But in the original code snippet, where is the splice that takes the Elt.eq code and lets us use it as a function?

"Everything old is new again: Quoted Domain Specific Languages"

Since some questions here are about staging in general, I probably should have linked to this draft. While I haven't had time to read it in detail, I agree with what I've seen so far (including their assessment of importance).

Shayan Najd, Sam Lindley, Josef Svenningsson, Philip Wadler

http://www.cse.chalmers.se/~josefs/publications/oldnew.pdf

Abstract:

We describe a new approach to domain specific languages (DSLs),
called Quoted DSLs (QDSLs), that resurrects two old ideas: quotation,
from McCarthy’s Lisp of 1960, and the subformula property,
from Gentzen’s natural deduction of 1935. Quoted terms allow
the DSL to share the syntax and type system of the host language.
Normalising quoted terms ensures the subformula property, which
guarantees that one can use higher-order types in the source while
guaranteeing first-order types in the target, and enables using types
to guide fusion. We test our ideas by re-implementing Feldspar,
which was originally implemented as an Embedded DSL (EDSL),
as a QDSL; and we compare the QDSL and EDSL variants.

Cool paper, worth it's own

Cool paper, worth it's own story which I started here.

Parameterized types require special scoping rules

The above referenced article by Inoue, Kiselyov and Kameyama mentions that parameterized types require special scoping rules.

See a proposed solution in the following article on
parameterized types