Neko 1.0

Hi,

I have just released Neko 1.0, which is an intermediate programming language with its embedable virtual machine. It might be very interesting for language designers since Neko have been designed so you can generate from your language to Neko and then reuse the virtual machine and libraries provided. You can then concentrate on the design and let Neko take care of the runtime.

More informations at http://nekovm.org.

If you have questions or comments, please feel free to ask.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Executive Summary please

if you could give us some idea of what makes neko unique/better from something like .net or llvm (which target different domains but I'm not sure what neko targets here), I might be more willing to take a look at it- as is I just don't have the time to dig into it and figure out what's interesting if anything.
So for now I just think ".net clone" and move on.

Neko VS LLVM / DotNet / Parrot

Yes that's one of the questions I'm being asked quite a lot right now. Since it's not very good idea to answer everytime the same thing, I'll try to summarize things in a FAQ and add it to the website. Stay tuned.

FAQ

A FAQ is now available.

Cool, thanks

That answers a number of questions. I'll take a little deeper look, since your lack of a standard type system intrigues me.

s-expressions

If the language is not meant to be read or written by people but by generators and parsers, wouldn't it be much easier to use if it used s-expressions as syntax?

Sugar

I considered at some time having a lisp syntax for the language, but I prefered having more syntactic sugar available for the "base" Neko language, while still keeping it a lot easier than for example C.

Since the memory AST is regular, I might add another regular syntax input to the compiler, which could be printed back to a Neko program, and which would be able to contain original file position (before generation). That might be very useful for upcoming debugging features such as stack traces and debugger.

Doing it using S-expr or XML is not yet decided.

His syntax is LL(1); it's bar

His syntax is LL(1); it's barely harder to parse than s-exps.

Maybe even easier...

S-exps are trivial to parse into a tree, but you'll need a lot of additional machinery to verify that the resulting tree is a well-formed syntax tree in your language... A richer syntax can move a lot more of this work into the parser itself, which can then output only well-formed ASTs. This also frees you to design data structures only for your language's abstract syntax rather than for all possible trees.

All of this applies just as well to XML, of course.

None of this is really a big deal, though, and s-exprs/XML have other advantages to recommend them. It's just that "ease of parsing" is kind of a red herring.

good points

Yes thats good points. But I was realy thinking more about generation than parsing. I'm not sure but it seems like a generator would often construct someting like s-exprs as a middle stepp. and that parsersers then would do this backvards. And that seems like unnessary work. Why do I have to make my generator do something that some parser will undo anyway? Mind you I'm not talking about just using a fully parentised prefix syntax strings but using actuall lists. That way there realy doesn't have to be any parsing at all.

not necessarily...

I'm not sure but it seems like a generator would often construct someting like s-exprs as a middle stepp. and that parsersers then would do this backvards.
It's true that a naive generator may construct something like s-exprs in the process of generating/printing code. But it's by no means necessary, nor is it really the best way to go about it. In a language with a rich datatype definition facility (either OO, or algebraic variants, or whatever...) I'd expect to see a generator that used a set of data types that matched the abstract syntax more closely than the concrete syntax, and that generated code directly from those types. Parsing is analogous, and there's really never a need to build such a general tree structure as s-exprs.

(I realize that this isn't a terribly good explanation, and I don't really have enough time to improve it. If you think examples would help, please let me know.)

syntax errors

Related question: Do modern yacc-alikes generate good error messages? Maybe someone can point me at one that does particularly well?

We've just made a new DSL with a yacc'alike parser and I don't like the vague "Syntax error before 'foo'" messages it prints. The least bad solution I've found is to add more productions to trap common syntax errors and print specific messages.

I don't know what yacc-likes

I don't know what yacc-likes do, because I don't use them.

I try to keep my languages in LL(1), and implement them using pure recursive descent. (Mostly, I use parser combinators in Ocaml.) Then, when you get an error, you know exactly where it happened, and what rules were firing from the stack trace, and from the folllow set you can tell which token was expected when the error happened. You can get very pleasant error messages with it.

Could you post an example ple

Could you post an example please? I'm curious to see what parser-combinators look like.

Look up some Haskell papers on it

There are lots of parser combinator libraries, a lot of educational ones in Haskell. A nice one is Utrecht Parser Combinators by Daan Leijen I think. There are also a large number of papers on Parser Combinators (mostly monadic, some by Wadler), if you Google for it you should find it pretty easily.

The trivial idea is that you have parsers which parse a piece of the input (and might succeed or fail) and on succes return a result. The parser combinators allow one to combine several primitive parsers to build more complex ones.

As an example in ML, from my own library, a collection of trivial character parsers which together parse a token expression:

1: let lexer_string: token lexer =
2:    lexer_position <*> function p ->
3:    lexer_satisfy is_dquote_char <.> function _ ->
4:    star (lexer_satify (not is_dquote_char)) <.> function l ->
5:    lexer_satisfy is_dquote_char <.> function _ ->
6:    succes (token_string p l)

You can build as complex parser combinator libraries as you want, the idea stays roughly the same though.


Uhm, Edit: some short explanation was in order I guess - I cheat a bit below, though, better read a paper

A combinator p0 <*> (function x -> p1) returns succes if both parsers p0,p1 were succesfull. The result of p0 is passed to the second argument which is expected to be a function taking an argument and returning a parser. A combinator p0 <.> (function x ->p1) returns succes if both parsers p0,p1 were succesfull, additionally, if p1 fails after p0 was succesfull an error message is returned.

Line

  1. declaration of the function, the type says it is an expression parser which returns a token.
  2. lexer_position is a parser which parses no input but returns the current input position. This line is to get the current position.
  3. lexer_satisfy c is a character parser which tries to parse the character c and on succes returns that character. This line parses one double quote character ".
  4. star p tries to parse zero or more occurences of whatever the parser p parses, it returns the list of results of each succesfull parse. This line parses zero or more characters which aren't double quotes ". (I beautified the code a bit here)
  5. parses the trailing "
  6. success x is the parser which actually parses no input but returns x as a result. In this case it returns a token.

Lazy

I would just like to quickly eyeball a combinator-parser-thingy for something resembling a programming language to see what it looks like. If someone has one handy I'd be mighty obliged. Your example looks like line noise to me I am sorry to say :-)

The good thing about yacc is that BNF source is easy to read and understand.

EDIT: My god what a lot of code just to say "[^"]*". Nearly as bad as Shriram Krishnamurthi's code! (ducking :-))

You don't say

... there is no free lunch? ;-)

I added some comments to the lines of code, uhm, hope you can get the gist of it.

It actually is a lot of code for parsing what is a simple regular expression. The nice thing about parser combinators is that it allows you to build more complex parsers which take an arbitrary number of arguments, thus at some point it becomes more powerfull than just yacc. For example, you could write a parser parse_regex r and have it parse "[^"]", and then simplify the above code, or you could build a parser which parses, what do I know, all c code which contains the name "Shriram Krishnamurthi" and contains less than 5 goto's. ;-)

Hutton/Meijer

I found the Hutton and Meijer Functional Pearl to be a good introduction to parser combinators, as well as really making me appreciate monads. They end with this example:

expr    = term   `chainl1` addop
term    = factor `chainl1` mulop
factor  = digit +++ do {symb "("; n <- expr; symb ")"; return n}
digit   = do {x <- token (sat isDigit); return (ord x - ord '0')}
addop   = do {symb "+"; return (+)} +++ do {symb "-"; return (-)}
mulop   = do {symb "*"; return (*)} +++ do {symb "/"; return (div)}

It's still more noisy than BNF, but the real power comes from noticing that this is both a parser and evaluator for a language of expressions. For instance, the addop and mulop combinators parse operators connecting terms and then return the operator, while the chainl1 combinator used in the expr production parses a sequence of terms separated by addops, and does the equivalent of foldl while it is going, so the result of the parse is the result of the expression. Not necessarily always a good thing (you might want to more clearly separate parsing and evaluation), but a neat trick.

Parsec

Probably the most popular parser combinator library in Haskell is Parsec. Most people find it quite pleasant to use and unlike many other parser combinator libraries it does a bit more in the department of error messages. I think it's definitely worth your time to check out.