ECMAScript Edition 4 Reference Implementation

The first pre-release of the reference implementation of ECMAScript Edition 4 (a.k.a. JavaScript 2) is now available. We've created a new web site for information about the ECMAScript specification and reference implementation. You can download source and binary forms of the reference implementation.

As we've discussed before here on LtU, the reference implementation of ECMAScript is being written in Standard ML. This choice should have many benefits, including:

  • to make the specification more precise than previous pseudocode conventions
  • to give implementors an executable framework to test against
  • to provide an opportunity to find bugs in the spec early
  • to spark interest and spur feedback from the research and user communities
  • to provide fodder for interesting program analyses to prove properties of the language (like various notions of type soundness)
  • to use as a test-bed for interesting extensions to the language

This pre-release is just our first milestone, i.e., the first of many "early and often" releases. Neither the specification nor the reference implementation is complete, and this early implementation has plenty of bugs. We encourage anyone interested to browse the bug database and report additional bugs.

We're happy to hear your feedback, whether it's bug reports or comments here on LtU or on the es4-discuss mailing list.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Uh, that won't work

http://www.ecmascript-lang.org/download.php indicates:
"NOTE: The Cygwin binary requires an installation of Cygwin and SML/NJ v110.65 or later. In the future we hope to provide native Windows executables."

SML/NJ latest version is v110.65:

http://www.smlnj.org/dist/working/110.65/index.html

... suggests that the archive for Win32 is here:

http://smlnj.cs.uchicago.edu/dist/working/110.65/smlnj.zip

Unfortunately, it doesn't exist - 404 :)

Use v110.64

Whoops! Thanks for the heads up. Actually, v110.64 will work just fine. I'll update the web site to say 110.64.

Downside

It seems there are some pretty big downsides to doing the reference in SML, too:

  • About .1% of working programmers having a working knowledge of SML. This means that implementors will have a very hard time reading the code and understanding the implementation.
  • There is very little chance that any production implementations will be done in SML. They will most likely be done in C, C# or Java -- all of which are very different from SML. That means production implementations will have to start from scratch.

Though I guess doing it in ML is probably better than doing it in Javascript ;)

PL implementors

I don't think the purpose of the spec is to provide usable plug and play libraries for the implementors to use. I'd rather suspect that it's intended to be an exacting specification - something which C, C# and Java would have a hard time with. And most of the JavaScript implementations are going to be based on current implementations anyhow.

As for the percentage of the working population that is familiar with SML, there are mitigating factors: (a). Implementors of PLs are far more likely to know SML than the population at large - PL implementors are not your normal run-of-the-mill working programmers; (b). Worst case is they have to learn SML which would be a good exercise - PLT is heavily influenced by such non-mainstream languages as Scheme, ML and Haskell.

Good Points

I would pay my hard-earned cash for a book (in the spirit of Knuth's "TeX: The Program" -- discussed in another post today) that described the Javascript implementation with the high-level code interspersed. Not only would it be great for PL implementors, but also for those that want to learn SML and those that want to understand the depths of Javascript.


But maybe I'm the only one ;)


--t

Follow the types

In ML (or any PL for that matter), the data types should give a roadmap of where you're starting from and where you're destined. If the types are not well thought out, then the algorithms have to perform lots of.gymnastics. PL implementations are very much about taking a starting data type and transforming it into an ending data type, with intermediate data types thrown in to make it more manageable.

Just glancing at the source, my uninformed guess would be that the best place to dive in would be in the abstract syntax tree (ast.sml). I'm thinking that it likely defines the structure after lex and parse are finished with their duties but prior to the eval/apply cycle. Lex and parse are fairly well understood pieces (what some would call a solved problem in computer science), though they always get a bit involved in details. The evaluation (eval.sml) is likely where the real work is done in terms of language semantics. Of course, you could just follow the sequence starting from main.sml and see where that leads you.

The advantage of starting from the datatypes is that you really don't need to understand ML in any great detail to get started. You have to be aware that datatypes are a more powerful abstraction than enumerations in C like languages. But once you view all operations in terms of types, the rest is just implementation details.

As I understand it, the ES4 reference implementation uses a fairly limited subset of SML. A quick glance at the code seems to show little use of the module system. And other than continuations, I doubt that the implementation relies on any exotic features that are not wll understood by implementors of other PLs.

Whatever ones take on functional programming languages, one thing that's not particularly debatable is that they make excellent platforms for exploring programming languages. This ES4 implementation should be seen similar in spirit to the Pugs implementation of Perl6. Whether it eventually finds use in the final JavaScript implementations is not really that important. What it does is give a nice implementation that can poked and prodded, exposing any problems in the specification and allowing experimentation and gaming with possible alternatives.

I personally think that all language communities should be encouraged to use FPs for their reference implementations.

maybe SML to Scheme?

Chris Rathman: Worst case is they have to learn SML which would be a good exercise - PLT is heavily influenced by such non-mainstream languages as Scheme, ML and Haskell.

Scheme I can handle — I even like it, and plan to implement it again. But I doubt I can get the others without more effort than I'm willing to spend. Back in the middle 90's when I was reading Appel's Compiling with Continuations, which uses NJ SML quite a bit for examples, I had a lot of trouble getting my head around the pattern matching based method dispatch. I had an easier time reading the C implementation of SML at the time, than I did reading SML code. I eventually gave up trying to read the SML code as non-productive.

Part of the problem is that I hate any form of implicit dispatch which "finds a suitable match" for parameter types. It's what I hated about Dylan's generic functions, even when I understood how it worked exactly from reading the borrowed spec from Common Lisp. I prefer to know statically which specific place I can look to find an implementation from just seeing the place where a call occurs. Basically I prefer "do exactly what I say" languages to "do what I mean" languages.

(It might be related to the way math as geometry is natural to me, but math as abstract algebra leaves me feeling parts are missing.)

A spec in Scheme would be a lot better. :-) Does anyone recommend an SML to Scheme translator written in Scheme? I might have an easier time understanding SML specs after translation.

val rec fib = fn 0 => 0 |

val rec fib = fn 0 => 0
| 1 => 1
| x => fib (x-2) + fib (x -1);

translates to:

(define (fib x)
(cond
((= x 0) 0)
((= x 1) 1)
(else (+ (fib (- x 2)) (fib (- x 1))))))

EOPL

I've tinkered with the idea of translating Essentials of Programming Languages to ML but haven't been very patient with the material. (There's a much better effort than mine that's an F# translation).

I would assume that EOPL is fairly idiomatic Scheme when exploring programming language implementation issues. And the similarity with ML strikes me as obvious. So a datatype in Scheme would be defined as:

(define-datatype expression expression? 
  (var-exp
    (id symbol?))
  (lambda-exp
    (id symbol?)
    (body expression?))
  (app-exp
    (rator expression?)
    (rand expression?)))

Which roughly corresponds to the SML type of:

datatype expression = Id of symbol
                    | Lambda of symbol * expression
                    | Apply of expression * expression

And a function that uses this data type in Scheme:

(define occurs-free?
  (lambda (var exp)
    (cases expression exp
      (var-exp (id) (eqv? id var))
      (lambda-exp (id body)
        (and (not (eqv? id var))
             (occurs-free? var body)))
      (app-exp (rator rand)
        (or (occurs-free? var rator)
            (occurs-free? var rand))))))

would roughly look like this in ML:

fun occurs_free (var, Id sym) =
      (var = sym)
  | occurs_free (var, Lambda(id, body)) = 
      (id <> var) andalso occurs_free(var, body)
  | occurs_free (var, Apply(rator, rand)) = 
      occurs_free(var, rator) orelse occurs_free(var, rand)

Perhaps I'm reading too much into it, but the EOPL code reads (is structured) very much like what I expect to see in SML. The types look similar in definition and uses something similar to pattern matching mechanism. The parallels are not always this obvious, but I'm thinking if you're looking at the SML implementation of es4 through the lens of Scheme, the macros used by EOPL might be helpful.

eopl macros

Chris Rathman: but I'm thinking if you're looking at the SML implementation of es4 through the lens of Scheme, the macros used by EOPL might be helpful.

Thanks, you were very kind to put this much effort into your reply, and the macro hint is probably more useful to me that it seems (and it got me reminiscing). I just now posted a copy of the original record definition macros I wrote in 1993 for EOPL under MacGambit, which I posted to usenet. I think Friedman subsequently adapted them further, since you can find my name tacked on later versions.

(The MacGambit docs at the time didn't specify how macros actually behaved, so I made a wild guess they acted exactly like the macros I'd added to my own Scheme implementation, and luckily I was right. But it's been fourteen years since I've thought much about EOPL macros, and I'll have to come back up to speed again.)

There is no try

Part of the problem is that I hate any form of implicit dispatch which "finds a suitable match" for parameter types. It's what I hated about Dylan's generic functions, even when I understood how it worked exactly from reading the borrowed spec from Common Lisp.
...
I prefer to know statically which specific place I can look to find an implementation from just seeing the place where a call occurs.

SML gives you exactly that. SML's functions aren't "implicit dispatch", and they're nothing like Dylan's or Common Lisp's generic functions.

As Gavin's example shows, SML's pattern matching is more or less just syntactic sugar for case expressions. The definition of a function still occurs in just one place.

Of course, with modules ("structures") you can have a function with the same name in more than one place, but which of those is invoked by a call is determined statically, either by the use of a compound name like "Foo.bar" or by using "open Foo" to import a structure's bindings into the current namespace. In that respect, it's no different from most languages with some kind of module or library feature. It can get a bit more indirect with higher-order modules, but it's still statically determined.

Check out a tutorial like A Gentle Introduction to ML. ("Compiling with Continuations" is not exactly an intro ML text!) SML is a pretty straightforward language, and it's worth learning if you have any interest in PL semantics or functional programming.

excellent!

I always greatly enjoy the clarity of your posts. I'm understating this quite a bit. (I'd vote you person most in need of cloning for the good of the rest of us.) Thanks very much to you and Gavin for correcting my error; it cheers me up a lot. I'm keeping this short, and only one reply, to keep from adding a lot of noise.

understanding ML

I would personally recommend a heavy dose of the lambda calculus to grok the nature of ML (and very much Haskell). The standard reference, Pierce's "Types and Programming Languages" carries this, though I'm not sure of its strengths for self-study. The matching (case) structure becomes much simpler and clearer at this level, and seeing it in full on lambda form can elucidate the reason we make convenient little shortcuts around its verbosity. Also, that "geometric" approach relates to the lambda calculus as well-- you can always refer to what is actually going on without the sugar.

SML Resources

[...] I hate any form of implicit dispatch which "finds a suitable match" for parameter types.

In that case it might be reassuring to know that SML is, in fact, a rather simple and explicit language and the kind of implicit dispatch you seem to be talking about simply does not happen in SML.

For someone serious about learning SML, I would strongly recommend reading a book. The MLton wiki has pointers to SML resources including tutorials and books. People (including me) on the #sml channel at irc.freenode.net might also help with understanding details of the language.

Scheme and ML...

...have a lot in common (at least that's my working hypothesis in my SICP translations) - both being very much based on lambda calculus.

If I remember correctly, Dave Herman knows Scheme better than ML (at least prior to starting this project), and since he's heavily involved in ES4, he could probably clue you in on the similarity and differences. I'd suspect that a translation to Scheme would probably be achievable, though I doubt that automated tools would be of much use in producing nice human digestable code.

(Perhaps a minimal implementation of Scheme in O'Caml that was given a while back might help).

I Agree

More-importantly, PL implementors are the kind that will go out and learn SML. And they are that 0.1% that actually do SML. :o)

I wasn't aware that SML was

I wasn't aware that SML was that complex of a language..

Neither is Spanish

But I'd have a hard time reading a spec in Spanish -- and I wouldn't want to have to learn it to do so. Besides, it's not just the language; it's also the "idioms".


--t

Learning to read

Learning to read a new programming language is much easier than learning to read a new natural language.

Of course

But there is still a lot of effort. Particularly for someone who will never ever have a need to use that language. The vast, vast majority of working programmers couldn't use ML if they wanted to.


Saying "just learn ML" to the working programmer -- the proletariat -- is like the bourgeoisie saying, "Just let them eat cake."


Programmers Unite!


:D


--t

The vast majority of working programmers

I work as a programmer for living.

Let me be frank. A competent programmer should be able to learn how to read (not necessarily write) ML with a reasonable effort -- say three days of self-study following a good book.

I don't believe that the vast majority of programmers would be even remotely interested in reading, let alone understanding, the source code of a model implementation of ES4.

Knowing the alternatives, I think that ML is truly superior to most mainstream languages (including C, C++, Java, C#, Perl, Python) for this kind of work. Using a mainstream language just to appeal to the masses would be a horrible idea.

The oppressor in the mirror

The vast, vast majority of working programmers couldn't use ML if they wanted to.

If you can write a program in Javascript, you can learn to write a program in ML. Programmers familiar with Javascript's higher-order functions should have little difficulty with the basics of SML. The biggest barrier here is the prejudice that ML (and SML in particular) is too difficult for a working programmer to learn.

Even if we grant the existence of hypothetical programmers who "couldn't use ML if they wanted to", such programmers would not be reading the reference implementation of a programming language, so there's no issue here.

Saying "just learn ML" to the working programmer -- the proletariat -- is like the bourgeoisie saying, "Just let them eat cake."

Except that since the time that Marie Antoinette didn't say that, we've achieved a situation in which there's an almost limitless supply of freely downloadable cake on the Internet, in the form of web sites, papers, articles, and books, devoted to teaching anyone who is willing to learn. LtU's Getting Started page lists a supply of such confections. Please, help yourself! Or is the proletariat not allowed to better themselves? (In which case, I have violated the law myself — lock me up!)

To use the vernacular of a more modern fairy tale than the one told by Marx, languages like SML are a red pill: a gateway to a view of the world which explains much more about the world you previously lived in than you ever imagined existed. You would withhold this red pill from working programmers? Who is the oppressor now?

More seriously, as already mentioned, the purpose of doing this in SML is to provide a precise and useful semantic description of the language, which is something that is not realistically achievable by using any of the popular programming languages. The alternative ways to achieve this goal do not involve, say, Python or Java, but instead would involve a mathematical semantic description, using e.g. an operational semantics. Unlike an SML implementation, such a description would be a guaranteed proletariat-excluder, because it takes years of study to learn to understand such descriptions. SML is one of the most accessible of the viable alternatives here.

The ECMAScript team has chosen an approach which is about as accessible as it can be without compromising the goals of the reference implementation. At the same time, they're providing a wonderful opportunity for any ordinary programmer who may be interested in the semantics of programming languages, Javascript or otherwise, to learn by example and expand their horizons. It doesn't get much more egalitarian than that.

Programmers, throw off the shackles of low expectations!

Follow-Up on Important Point

The goal of having the reference implementation in Standard ML isn't to make it easy to use the reference implementation in a browser (or any other program, for that matter). The goal is, as Anton points out, to have it be in anything that's more readable than, say, the denotational semantics of R5RS given in the report. Several of us here on LtU convinced the team to switch from O'Caml to SML because of the fact that SML has a formal semantics (which is itself being mechanized), whereas O'Caml, nice as it is, lacks such a semantics. So in the limit, once the semantics of SML has been mechanized, then the semantics of ECMAScript become formal for free, while remaining (again, relative to a denotational or operational semantics) readable. This should prove a boon to developers of ECMAScript implementations in any other language: differing behavior from the reference implementation is a bug, and the problem won't be that the reference implementation is ambiguous or unclear. This, in a nutshell, is the point of formalization, and I maintain that it is highly desirable in future programming language designs.

In which I call foul again

The goal is, as Anton points out, to have it be in anything that's more readable than, say, the denotational semantics of R5RS given in the report.

Why can't the R5RS denotational semantics be read as a functional program? It seems to me that they are about equally readable?

So in the limit, once the semantics of SML has been mechanized, then the semantics of ECMAScript become formal for free, while remaining (again, relative to a denotational or operational semantics) readable.

To quibble, the semantics of ECMAScript doesn't become formal (or is the reference implementation normative?). The formal semantics of SML only allows researchers to prove properties of the reference implementation. Besides, the semantics of ECMAScript could have been formalized from the get-go by just biting the bullet. No need for SML, which only stands in the way here.

For the use case you describe, an implementer testing against the reference implementation, it (the reference implementation) seems overly complicated. The more features of SML it uses, the more possible it is that there is a "bug" in the interaction between SML and the intended semantics. Since Reynolds '72, we've known that definitional interpreters should be independent of evaluation order of the metalanguage (ie, written in CPS). Glancing at the implementation, it seems like exceptions are possibly used for control flow. If so, then this might be a mistake; they should probably only be used to abort the implementation when it goes wrong. Using callcc is a bad idea, just write the code in CPS. Etc.

So all in all, it's not a formal semantics, but still just a (formalizable) reference implementation. When read as a semantics, there is a danger that it overspecifies the language, as most implementations do.

I call foul

The alternative ways to achieve this goal do not involve, say, Python or Java, but instead would involve a mathematical semantic description, using e.g. an operational semantics. Unlike an SML implementation, such a description would be a guaranteed proletariat-excluder, because it takes years of study to learn to understand such descriptions.

I have to disagree that a big-step operational semantics, say, takes years of study---unless you mean the four years of undergraduate computer science that are probably needed to appreciate the reference implementation anyway. :)

For the sorts of people who care about semantics (me), it would be more accessible and probably more useful than an implementation in SML.

Appeal to the ref

I have to disagree that a big-step operational semantics, say, takes years of study---unless you mean the four years of undergraduate computer science that are probably needed to appreciate the reference implementation anyway. :)

I was definitely counting those years, yes. My point is that a reasonably smart "working programmer" who can code in some mainstream languages, but has little or no formal CS education, is likely to be able to learn enough SML to experiment with the reference implementation, and get something useful out of that, without needing any formal study. OTOH, a formal semantics of any kind is likely to be a serious barrier for most such programmers.

For the sorts of people who care about semantics (me), it would be more accessible and probably more useful than an implementation in SML.

Sure, it's a compromise. I take your point about issues with e.g. overspecification, but I think the SML reference implementation is at least potentially a better compromise than you seem to be suggesting. With a little care, I would think that the core of the reference implementation could be kept somewhat independent and closer to a formal semantic description than the rest of the implementation. (If that hasn't been done, well, let's all rag on Dave... ;-)

Although it's on a smaller scale, I did something like that with my implementation of the R5RS semantics (here): the semantic core is essentially a transliteration of the R5RS semantics, and I built a simple interpreter around that. Of course, the core in that case is still CPS. I imagine the ECMAScript implementation could do that if it was considered worthwhile, but I doubt a real formal semantics is their highest priority. I still think that an SML reference implementation will go a long way towards staying honest, semantically speaking, helping avoid what you described in this comment as "dozens of language features and effects that are subtly wrong, because they were designed by hacking on an abstract machine or small-step reduction semantics instead of some higher level semantics".

"I'll allow it"

Sure, it's a compromise. I take your point about issues with e.g. overspecification, but I think the SML reference implementation is at least potentially a better compromise than you seem to be suggesting. With a little care, I would think that the core of the reference implementation could be kept somewhat independent and closer to a formal semantic description than the rest of the implementation. (If that hasn't been done, well, let's all rag on Dave... ;-)

I didn't mean to suggest it's a bad compromise: the ES community seems to be taking semantics seriously, so this is a good first step toward the mainstreaming of semantics. I can't think of any better choice than SML (except maybe Twelf :)) for this purpose.

My critical point is just that the implementation looks a little too far from an implemented operational semantics. My cautionary points are that (1) claiming too much formal semantic value from a reference implementation (notice that Dave doesn't claim so much in his OP) might raise its expectations so high that it is set up to be perceived as a failure when it can't deliver; and (2) op. sem. really isn't so hard for programmers to learn to read and understand (for a good example of the mainstreaming of this sort of specification, see the XQuery and XPath formal semantics).

"formalized" is not a formal property

...claiming too much formal semantic value from a reference implementation (notice that Dave doesn't claim so much in his OP)...

I pretty much agree. Our decision to go to SML was a pragmatic one; it was motivated by the fact that there were many bugs in previous versions of the spec due to the fact that the pseudocode reference "language" was not executable and therefore untestable. We chose SML because it's 90% lambda calculus, its few choice side effects are almost exactly what we need to specify ECMAScript's features in a modular way (cf On the Expressive Power of Programming Languages), and most importantly, it's already implemented.

I agree that Paul is overstating the "formality" of this approach. "Formalized" is not a transitive property; indeed, it's not even a formal property. I would instead put it this way: we have chosen SML as a nice intermediate meta-language for specifying our language.

The fact that SML is formally specified gives us some level of confidence, but really you would probably need to translate our reference implementation down to the lambda calculus via a monadic translation (the "SML/NJ monad" with non-termination, mutation, exceptions, and first-class continuations). This is painful enough that I wouldn't want to do it by hand, and I don't think it's a good idea for the specification. It's a trade-off, of course, because it means you have to understand the semantics of ML's side effects to understand the specification of ES4. It's my opinion that the side effects in SML are sufficiently simple and well-understood that this is the right side of the trade-off.

So all in all, it's not a formal semantics, but still just a (formalizable) reference implementation. When read as a semantics, there is a danger that it overspecifies the language, as most implementations do.

Which is why the reference implementation is not the normative part of the spec. Just as the draft R6RS provides an informative (i.e., non-normative) operational semantics, we will probably provide the reference implementation as a non-normative appendix or companion document. We will, however, use portions of the reference interpreter inline in the spec (i.e., the normative document), but will drop into English where appropriate to avoid overspecification.

Overspecification is definitely a hard problem in language definitions. We don't claim to have any special solutions for it. When all else fails, the state of the art in deliberate unspecificity seems to be English.

Thanks...

...for cleaning up after me, Dave. :-)

I should definitely have been more careful with my terminology. The point I was driving at wasn't that the reference implementation and the intended semanatics of ECMAScript were isomorphic—I'm very much aware of the dangers of overspecification. Rather, my point was the weaker, but still helpful as I think we all agree, one that anyone who isn't sure what the implications of the underlying SML implementation of some aspect of ECMAScript is have a formal semantics to fall back on. That's not to say that such a falling back in its full glory wouldn't be painful, or that the SML semantics might not include things that are unrelated to the intended semantics of ECMAScript, so yes: caveat emptor. The bottom line is that I agree with Dave: the choice of SML was a pragmatic one.

Overspecification

Overspecification is definitely a hard problem in language definitions.

Why so? I understand that it can become a problem in library specifications. But IMO, a language semantics can never be too precise (except for inherently non-deterministic aspects like concurrency, of course).

Incidentally, SML has a completely specified operational semantics - up to resource limitations, there is not a single question left open as to how a program behaves. I have always considered this a big plus, even though it sometimes makes the life of implementors slightly harder. Obviously, it also is one of the properties that makes it particularly well-suited for doing reference implementations.

"inherently non-deterministic"

...is in the eye of the beholder.

As is the distinction between language and library. :)

Mh...

Mh, I can agree with the latter, but I'm not so sure I'm buying the former. I see a clear difference between real non-determinism and mere underspecification. I'm sure it would show up in a formal semantics.

Can you give an example of something you'd prefer to keep underspecified in ECMAScript? And why?

Evaluation order

To me the classic thing to want to underspecify would be evaluation order and/or time/space-complexity. That would seem hard to do with a reference implementation, except perhaps by relying on underspecification and or non-determinism in the reference language, but that also seems like a rather ugly thing to do, in my eyes.

Evaluation order

To me, evaluation order is the classic example of s.th. I specifically do not want to have left unspecified, because that's a perfect recipe for subtle bugs and incompatibilities, and the alleged potential performance win tends to be neglectable. For something like ECMAScript it obviously would be a big mistake.

Complexity is a more interesting example. Most language specifications do not talk about it, and conventional formal semantics doesn't model it. Of course, a reference implementation won't either (unless you claim it to), so it is not a problem with that approach.

I was thinking about

I was thinking about Haskell, and how its not defined to be lazy but just generally non-strict, allowing for speculative evaluation.

and call-by-name...

and call-by-name...

Overspecifying the unobservable?

Mh, would that count as overspecification? The difference is not observable. So for practical purposes, using a lazy semantics wouldn't actually be overspecifying anything, in the sense that it would not restrict implementations.

In fact, if you were to formalise Haskell, you indeed would have to make a choice. You'd probably go with call-by-name, though, as that is simplest, while giving the same meaning to Haskell programs as call-by-need would.

This would be true if

This would be true if Haskell didn't have an IO monad. It's still true denotationally, but whether or not you require sharing and to what extent is a significant decision.

IO monad

Can you elaborate? I'm not sure I see how the IO monad changes the situation (not counting abominations like unsafePerformIO, which does not even leave the static semantics intact).

Regarding sharing: neither time nor space complexity is modelled by conventional semantics, so how can we argue about sharing at this level?

The IO monad gives you means

The IO monad gives you means to observe the evaluation process - even if it's as primitive as peek and poke. To put it another way, it lets you observe the space and time behaviour on the actual hardware you're running on.

You can't discuss sharing without having a heap, but once you've got one it's easy enough. Various abstract machines for evaluating non-strict languages manage to specify it sufficiently.

over-specification

By and large, ECMAScript leans on the side of specifying rather than leaving things unspecified; efficiency is not the #1 priority (though not a non-priority), and compatibility is the most important consideration in everything. So I can agree, at least for ECMAScript, with the spirit of your point, but not its absolutism.

One example where we've talked about leaving something unspecified is in user-defined type conversions. If we assume that type conversions are idempotent, then we can avoid performing them twice. In some cases, this is critical; for example, for tail recursion, this can be used to prevent accumulation of return-type-conversions on the stack. However, since users can write their own type conversions, there's no guarantee that they are idempotent, and dropping redundant conversions can be observable. We are considering making idempotence the programmer's responsibility to enforce, thus allowing implementations to optimize away redundant conversions.

In a sense, any time something in a language is unspecified and the programmer's responsibility, you can call it "unsafe." But "complete" specification of a language (for some definition thereof...) is not always a slam-dunk.

Will Clinger made some thoughtful comments about over-specification last year on the R6RS mailing list. It's interesting reading.

Thanks

OK, I see your point with conversions, although I'm not convinced that such an optimisation is worth dropping definedness in the case of a language like ECMAScript. But similar issues can arise whenever you add some user-definable implicit magic to a language. For example, in ML-like languages you have a closely related problem with views or type classes, at least when you add effects to the mix. I still find it preferable to fully specify behaviour, but there is a cost for which YMMV.

Also thanks for the link, which indeed is interesting, though I do not agree with all his points (and I notice that almost all of the issues mentioned are in the library realm :-) ).

[Edit: Interestingly, some of these problems, including the one with conversions in ECMAScript, won't appear in typed languages, because they enforce much stronger invariants and exclude many pathological situations.]

SML, not OCaml

I read that other LtU thread and discovered that plans changed from OCaml to SML.
I think OCaml would have had less of the porting problems that are so graphically expressed on the download page. (There is OCaml for the Unixes, Mac OS X included, and Windows. And there is F#, which is basically OCaml for .NET) And implementing JS 2.0 would not require you to depend on any platform-specific stuff, I think.

The porting problems would not have existed with OCaml. (I couldn't get a download and take the kid for a spin.)

Nice work, there! Keep it up! (And you know what they say - if it remains JavaScript, it remains the best language with braces.)

Porting problems

There is OCaml for the Unixes, Mac OS X included, and Windows. ... The porting problems would not have existed with OCaml.

SML implementations are available for the platforms you mentioned.

And there is F#, which is basically OCaml for .NET

Yeah, except that F# is not OCaml.

Porting problems

SML implementations are available for the platforms you mentioned.

Yes, but I imagine there are some important testers of JS 2.0 who would want to get this build and run it, but they can't because they have Linux. There is source, yes, and that requires them to have SML installed on their system, which raises the bar to contribution. I imagine an OCaml program would be compiled easily to binary and be available for all those platforms much easier. It is hard to imagine why there is no Linux build. I could be wrong, anyway, but ...

Yeah, except that F# is not OCaml.

No, it isn't. But one can write the OCaml stuff that defines a language in F#. They are deeply similar, nearly identical. And that would mean an implementation of JS 2.0 that runs on .NET (which includes Mono, and therefore you'd have a JS 2.0 implementation that runs on Linux and Win32 and Mac OS X - one code base).
The choice of an ML-family language was because of the ML-ness, which is shared between OCaml and F#.

The reason that was given for moving to SML (in the linked thread) from OCaml didn't convince me, even though I still think SML is a fine replacement.

I salute the work that team has put into it, and I feel they are setting a very good and important trend for many other specification teams.

ES4 and Linux

I just installed SML/NJ 110.65, grabbed the ES4 sources and compiled it (make) on my Debian Linux machine. I also used heap2exec to make an executable binary of the produced heap file (not sure if the makefile already does that). And now I have an es4 interpreter. I had no problems whatsoever. (Now I'll just need to learn the ES4 language.)

make exec-release

make exec-release produces an executable (in the subdirectory exec) using heap2exec. We haven't released on Linux simply because we haven't set up a Linux box yet.

this is a *pre-release*

There's no Linux release because we haven't set up a Linux box. There's no Windows port because we haven't created it yet. None of this has anything to do with SML vs. Ocaml. This is a *pre-release*, which means we're offering what we have available now, but it's not an official release.

not problems, really

We aren't really having porting problems. This is just early in the development process. The MLton compiler does target Windows (even without Cygwin), so we expect to be able to compile for Windows.

OCaml and SML are similar enough for our purposes, so it wouldn't have made a huge difference either way. However, we do intend to use just a little bit of callcc (for one orthogonal piece of the language, namely generators), and for that OCaml just doesn't have a very reliable solution. Of course, Standard ML doesn't officially have callcc either, but both SML/NJ and MLton support it.

We will get to the point where you can "get a download and take the kid for a spin," promise. :)

Oh. Good, Then.

Generators are some of the things I'll love about JS 2.0.:o)

How stable ?

Dave, I'm starting to work on static analysis of JavaScript. Actually, the security of Firefox extensions. So I'm wondering if I should carry on with the unreadable ECMA 262 or if I can skip it and go ahead with your new version.

wonderful!

That's great to hear! At this point the reference implementation is still very preliminary, and there aren't really any mature specification documents to go off of. The current documentation is probably not very readable so far, but we hope Edition 4 will end up much more readable than Edition 3.

So the short answer is: go ahead and start playing with it, but keep in mind it's rough around the edges.

Will do

I guess my first step will be to try and replicate your type analysis with slightly more generic functions.

By the way, do you have a mozilla.org nick ?

Reviewing the code

I glanced very briefly (a couple of minutes) through some of the ES4 interpreter sources and I noticed several issues and places where things could be improved (in a way or another). I don't have the time right now, but if you're interested, I could do some reviewing of the code (spend a couple of hours reading through the code) after next week. I'd prefer to present the results of such a review on IRC, for example, so that I wouldn't have to write a long monograph and misunderstandings could be fixed quickly.

For example, one correctness issue that I noticed (immediately) is the handle expression in the evalDoWhileStmt function in eval.sml at line 4157. The problem is that the scope of the handle expression extends over the entire (dynamic execution of the) loop. This means that every iteration of the loop installs an exception handler and this means that the loop does not run in constant space. For background, I'd recommend reading Benton and Kennedy's article on Exceptional Syntax (google finds it).

Edit: I should add that the overall clarity of the code is very good, IMO.