Specifying ECMAScript via ML

Brendan Eich has just mentioned on the es4-discuss mailing list that we will be using ML as the definition language for the semantics of ECMAScript Edition 4. One of the immediate benefits of this approach will be that our definition will also serve as a reference implementation. LtUers will of course recognize this as the approach of "definitional interpreters" (discussed on LtU here and countless other times).

Our initial intention was to write a custom specification language that would be tailored to our purposes, with just the right core control features and datatypes to serve as an appropriately abstract model of ECMAScript. But before long, we realized that we were pretty much describing ML. Rather than specifying, implementing, and documenting another programming language that was 80% ML, why reinvent the wheel?

The benefits of this approach are many: a definition in a formal language that itself has a clear and precise definition, the luxury of many robust and high-performance implementations (we'll be using OCaml extended with first-class continuations), and free "technology transfer" from all the existing ML literature and communities.

And finally, by releasing real software--written in a direct functional style--for reading, type-checking, and evaluating ECMAScript programs, we'll be providing a standard set of tools for static analysis and other research on the ECMAScript language.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

That's very interesting (and

That's very interesting (and cool)! Of course, one has to wonder why ML and not (Haskell or Scheme)...

why ML

why ML and not (Haskell or Scheme)

All of the above would be excellent choices. Scheme might have been especially nice for quasiquotation and code generation, and we would be able to use call/cc out of the box instead of installing a patch. Haskell would be nice for describing the side effects in a monad and then specifying order of evaluation via do-notation (as is, we'll probably writing this with stylized code closely resembling A-normal form). In some cases, both Scheme and Haskell might be easier for expressing graph structure.

On the other hand, ML provides just enough types to structure the AST's and basic datatypes for modeling environments, values and such, but without the complexities of the monadic types. We get ref cells, exceptions, and continuations (almost) out of the box--which gives us most of what we need for modeling the effects of ECMAScript.

In the end, I think any of these languages would have sufficed. Mostly, I'm excited to see that, in the real world, EoPL-style interpreters are a practical way to provide a precise and formal language specification at a reasonably high level of abstraction.

ironic, ML was originally made for modelling code

I'd say the biggest advantage of ML is that it's the easiest functional language to port to imperative code. Scheme makes you think about implicit types, macro expansions and a spaghetti stack. Haskell forces you to think about point-free form, laziness (especially lazy IO), and the subtleties of monads. ML has less "magic".

Previously used Scheme

>> one has to wonder why ML and not (Haskell or Scheme)...

A friend of mine, who works for Microsoft on the ECMAScript team, told me that previous reference implementations were actually written in Scheme.

It would have been easy for them to have just stuck with Scheme so they must have felt that ML offered enough of an advantage over Scheme for it to be worth their effort to switch.

Interesting. More details,

Interesting. More details, anyone?

Waldemar's CL stuff

I'm not aware of a Scheme connection, but in JScript is a Functional Language, there's a link to semantics-related code for Javascript, implemented in Common Lisp by Waldemar Horwat: here.

confused...

You're confusing Microsoft with ECMA. The ECMAScript working group is a multi-organization effort, with Microsoft as only one of the organizations involved. Our decision to use ML is for the purpose of specification, not implementation. It just so happens that by expressing it in a programming language, it will also be executable and therefore serve as a reference implementation.

It would have been easy for them to have just stuck with Scheme so they must have felt that ML offered enough of an advantage over Scheme for it to be worth their effort to switch.

There is no "switching" going on here--each organization with an ECMAScript-based product will continue to be free to implement their product however they see fit.

Just my own hobby-horse, but...

What's the story for floating point verification? Do ECMAscript and ML use the same floating point conversion libraries, and the same trig functions?

I am pretty sure that ECMAScript uses David Gay's gdtoa, which means it is very likely correct -- I know of no unfixed bugs in gdtoa, and this is something I tested in great detail in years past. ML should also use gdtoa, if it does not already.

In the case of trig functions, I do not know what either one uses, but they ought to match. Hardware trig functions are busted because they use a too-small approximation of PI for range reduction, which leads to catastrophic cancellation errors near zeroes of sin and cos, yielding as few as 4 significant bits instead of the 64 promised for 80-bit FP. Worse, last time I measured, the hardware was not significantly faster than software (fdlibm, properly compiled).

JavaScript has semantics?

Why wasn't I told??

cheap shot

That's easy to say, but the previous spec was actually written in a highly stylized pseudocode that, while far from perfect, was actually relatively consistent and clear. It suffered from being a bit too low-level and had a number of bugs (probably because the pseudocode wasn't executable), but as far as languages go, ECMAScript is head and shoulders above many popular programming languages for precision of definition.

(This is pretty much by necessity. When you only have one implementation, specification isn't so important--c.f. Perl, Python, PLT Scheme, OCaml... But when you're trying to create a spec for multiple vendors in a competitive market, clear and precise specification becomes a whole lot more valuable.)

Better language for reference implementation than specification

I'm surprised OCaml will be the definition language, instead of just a reference implementation. ML is better than many alternatives, but why not Coq/Twelf/Isabelle or another poplmark competitor? Was the deciding factor really just the distaste for monads? Is the spec giving up on mechanically checked proofs? As far as executable specifications, programs extracted from Coq proofs are very usable.

Also first 2 links to are to es4-private not public mailing list

Re: reference implementation than specification

The point is to serve both purposes simultaneously. It needs to be readable at the same time as being executable. Putting everything into a monad would work, but would raise the bar for comprehension.

I expect a whole-program translation from the ML code into a pure mini-ML via a monadic style should be entirely straightforward. This will likely be important for trying to prove theorems down the road, especially if we attempt to employ theorem provers.

However, it's one thing to ask readers of the spec to learn ML. It's another to ask them to learn to read monadic programs written in the Calculus of Inductive Constructions.

(Re: the private links -- thanks. I corrected the second link, but not the first; you have to subscribe to view the archives. Subscription is open to the public.)

call/cc is problematic in OCaml!

Dave Herman wrote:
``(we'll be using OCaml extended with first-class continuations)''

You mean Xavier Leroy's call/cc patch? Are you aware of the
experimental and totally unsupported status of this patch? Xavier
Leroy explicitly said to use on one's own risk and don't blame him. If
you look at the code, you'll see some of the interesting issues (like
writing expressions in a certain way to specifically disable the
tail-recursive optimization).

The main problem is the formal status: undelimited continuation and
call/cc have issues in a strongly-typed language. Section 1 of Gunter,
Remy and Riecke `A generalization of exceptions and control in ML-like
languages' discusses the issues in detail. The type 'a cont is quite
tricky. And even if we take into account the `weakness' of the type
variables, we still have problems; with call/cc, `strong soundness
fails to hold'
as Gunter et al say -- and then give example (on p2 of
their paper).

It seems that their approach -- make the prompt type explicit -- is
the simplest (although not the only: see Danvy/Filinski DIKU TR) way to
introduce continuations in a typed language while preserving
soundness. These continuations will necessarily be delimited. Gunter
et al proved the strong soundness: the value yielded by any expression
indeed has the type predicted by the typechecker.

Incidentally, another problem with call/cc in OCaml is that it
interacts badly with exception handling. When the captured
continuation is invoked, all existing exception handlers are removed
and replaced with those that were in effect when the continuation was
captured. I mean all of the exception handlers. Beware: there is no
dynamic-wind that comes with call/cc in OCaml.

call/cc

Note that I carefully said "first-class continuations," not specifically callcc. Xavier's patch is a possibility, but your patch is another one. We haven't settled on specifics -- we're only getting started. In fact, we only need enough expressive power to model very simple control effects. In most cases, ML exceptions are enough. But we also need a small amount of delimited continuations to model the new yield feature (similar to Python's coroutines). For this purpose, either delimited or undelimited continuations would do just fine.

As for the lack of support, this is intended as a reference rather than a production implementation. If first-class continatuions are well-defined for OCaml, I'm not so concerned about them being well implemented. If they're not well-defined, however, that's an issue. Thanks for the pointers.

delimited continuations in OCaml

Dave Herman wrote:
``Note that I carefully said "first-class continuations," not
specifically callcc. Xavier's patch is a possibility, but your patch
is another one. We haven't settled on specifics -- we're only getting
started. In fact, we only need enough expressive power to model very
simple control effects. In most cases, ML exceptions are enough. But
we also need a small amount of delimited continuations to model the
new yield feature (similar to Python's coroutines).''

I guess I'm simply not used to caml-shift being referred to as a
`patch'. It doesn't actually patch the compiler, etc. -- it merely
uses the `API' that the run-time provides. All the needed
symbols are public. The delimited continuations implemented by
caml-shift are fully typed, and preserve strong type soundness of the
language. In fact, take_subcont is essentially cupto, and push_prompt
is `set'. These delimited continuations are fully consistent with
OCaml own exceptions, with push_prompt acting like the `try'
form. This is no surprise as delimited continuations are actually
implemented in terms of exceptions.

Regarding yield: it certainly can be realized within the caml-shift
framework. In fact, the end of the file

http://pobox.com/~oleg/ftp//ML/vdynvar.ml

does exactly that. It takes only one line:
let yield () = shift p0 (fun f -> J f) in.... The file
implements a simple threading system (whose point was to
illustrate thread-local and thread-shared dynamic bindings).

Re: delimited continuations in OCaml

I'm not used to caml-shift being referred to as a `patch'

Right you are, thanks.

call/cc and type soundness

The main problem is the formal status: undelimited continuation and call/cc have issues in a strongly-typed language.

I'm aware of the problem raised by Harper and Lillibridge in 91, but I believe the value restriction takes care of that. Are you referring to another problem? I'll check out the Gunter et al paper -- thanks for the reference.

SML/NJ has had undelimited call/cc for decades, so I'd be surprised to learn that it's (still) unsound.

It seems that their approach -- make the prompt type explicit -- is the simplest (although not the only: see Danvy/Filinski DIKU TR)

Which TR is that? [edit: is it this one?]

I would definitely prefer delimited continuations, since there should always be clear delimiters for ECMAScript control effects (including yield). But if we end up with SML that may not be an option. Since our control effects are limited in power, the interactions with exceptions may not be problematic--for example, yield is only one activation deep, the exception handlers in the context of the generator function are irrelevant. But we'll definitely have to work out the details, e.g., what happens to any exception handlers within the generator function activation.

Clearly I need to re-read your recent ICFP paper on delimited dynamic binding! :)

[edited: got rid of some bogus claim about the parent contexts of coroutining functions]

As Much As I Love O'Caml...

Dave Herman: The benefits of this approach are many: a definition in a formal language that itself has a clear and precise definition...

This is true of Standard ML, but not O'Caml. O'Caml and Haskell both lack formal semantics.

Standard ML is what you want

IMHO, between the two MLs, Standard ML is the right choice for this kind of work. O'Caml doesn't have anything close to a precise definition and the exising spec contains many pitfalls for this kind of work. For example, the order of evaluation of arguments in O'Caml is unspecified, which could easily lead to having the behavior of such a definitional interpreter to change between revisions of the O'Caml compiler.

perhaps

O'Caml and Haskell both lack formal semantics

I knew someone would call me on that one. :)

SML-NJ might be nicer in a couple respects: the benefit of a true formal semantics, as well as (if I remember right) callcc built-in. OTOH, in our case we intend to use such a simple core of any ML that (modulo syntax) our evaluator should be trivially translatable from one ML to another, and in practice I wouldn't expect the behavior to change at all. Perhaps I'm being naive?

For example, the order of evaluation of arguments in O'Caml is unspecified

We never intend to rely on order of evaluation: wherever order of evaluation of ES expressions matters (which is to say, everywhere), we will make it absolutely explicit what happens when by writing in a style reminiscent of ANF.

Advantages of O'Caml?

What advantage does O'Caml offer over SML for the purpose of giving a definition for ECMAScript 4?

not a whole lot

We picked OCaml mostly out of familiarity, and perhaps for community support. But the feedback in this discussion has been very helpful and it's started some internal discussion amongst the group -- we're considering switching to SML. (This is the benefit of getting early feedback from the community!)

It may not make much of a difference, but there's a peace of mind that comes with using a language with proven type soundness. Incidentally, does anyone know if the official definition of SML includes call/cc in its semantics?

IIRC...

It doesn't; call/cc is a SML/NJ-specific extension.

It does exist in MLTon,

It does exist in MLTon, though, as well.

True, but it can be rather

True, but it can be rather slow. In my thesis I have a benchmark that ran 49 times slower under MLton than under SML/NJ because it was making heavy use of call/cc (it actually uses Filinski's ML implementation of shift/reset which makes use of call/cc and a single state cell). This is in contrast to most of my other benchmarks where MLton is generally much faster than SML/NJ.

soundness of SML

I don't think SML enjoys proven type soundness. I think it's *rigorously defined*, which means Harper, Milner, Tofte, Stone, etc. wrote up a bunch of greek that defines the static and dynamic semantics for SML. But I don't think anyone has ever proven type safety based on these definitions, though I think some safety results were shown for small subsets. However, people at CMU are working on proving type safety for all of SML and also mechanizing the proof in Twelf. They are almost there -- they proved type safety for a sufficiently expressive IL and what's left is to show an elaboration from full SML to this IL (simultaneously defining the semantics of SML in terms of this elaboration).

At that point, SML will be the only practical general purpose language enjoying a full, mechanized, type safety proof. (And if it wasn't mechanized no one would believe it.) As such, elaborations into SML will give a precise mathematical meaning for the language you're specifying.

(It's also a bit unclear whether the SML that Bob Harper is specifying is the same as the one implemented by SML/NJ, but that's less important.)

In any case, I think elaborating into SML (or languages defined with equal rigor) is the future of specification for high-level languages.

elaborating into SML

I think elaborating into SML (or languages defined with equal rigor) is the future of specification for high-level languages.

It's certainly one viable approach (of course I think so), but it's not without its drawbacks. Specifically, it doesn't necessarily make the theorems you want to prove easy. For example, the best-known approach to date for proving type soundness is via small-step operational semantics. Our proof techniques in CS are fragile, and even models that are obviously equivalent have drastic effects on our ability to prove various kinds of theorems.

(BTW, thanks for the clarification on the status of SML's soundness.)

Dave Herman: OTOH, in our

Dave Herman: OTOH, in our case we intend to use such a simple core of any ML that (modulo syntax) our evaluator should be trivially translatable from one ML to another, and in practice I wouldn't expect the behavior to change at all.

Pidgin ML, then?

Dave Herman: Perhaps I'm being naive?

Absent proofs, only time will tell.

Ditto

Dave Herman: the luxury of many robust and high-performance implementations

I had pretty much the exact same reaction as Paul... you described SML, not O'Caml.

I think that SML's implementations are not something to be taken lightly. No other language has achieved the same level of consistency across as many implementations.

SML

Thanks to the feedback here and on es4-discuss, we've decided to go ahead with SML/NJ instead of OCaml as our development platform for the reference evaluator of ES4. The primary reasons for the switch are the formal semantics, built-in callcc, and stability of definition (OCaml is an active, bleeding-edge research language; SML is an old, stable, and conservative language).

Note that this is not meant as a slight on OCaml in any way: our needs for this task are somewhat peculiar in that we are doing a specification rather than trying to build a robust or high-performance implementation.

And, in fact, it probably makes little difference in practice: I still expect that the vast majority of our code will be trivially translatable between multiple dialects of ML.

Thanks everyone for your input.

using sml for es4

good move :-). let me know if i can be of any help.

bob harper

Getting updated ?

How can we get more information on the progress of JS 2 ?

update coming soon

We'll be re-exporting the public export of the wiki soon. I'll post again when we do.