Katahdin: Modifying your programming language as it runs

Katahdin is a programming language where you can define new language constructs such as expressions and statements as easily as new types or functions. For example, you could define a new operator, a new type of loop, implement a syntax from another language that you like. After defining a new construct you can use it on the next line in the same file, so there is no need to recompile each time you want to add a new construct. Katahdin is powerful enough that you can define an entire existing language, or design a new language from scratch, making Katahdin a universal language interpreter.

For example, most programming languages have a modulo, or remainer operator. To define one from scratch in Katahdin it is only a few lines:

class ModExpression : Expression { pattern { option leftRecursive; a:Expression "%" b:Expression }

method Get() { a = this.a.Get...(); b = this.a.Get...(); return a - (b * (a / b)); } }

Katahdin is an application of parsing expression grammars and packrat parsing, an active area of research. The site contains a master's thesis and public domain source code.

A suggestion

Thanks for the contribution, this is a very cool and promising looking language!

Might I suggest using a slightly more sophisticated example than defining an operator to showcase your language?

What you have accomplished is very impressive, but the example you use is underwhelming. I could show modulo operator definition in Cat:

  define % { dup2 div mul sub }

And say "look it is just one line", but that is not a fair comparison because what you are doing (modifying the language grammar on the fly) is fundamentally more important. Your canonical example should showcase that.

Just my two cents. I look forward to reading more about Katahdin.

By cdiggins at Wed, 2007-06-20 17:28 | login or register to post comments

DSL

You really should mention DSL, because if it's one of the most practical uses for your language -- define and use DSL in the same module, at runtime.

Congratulations on a very cool project! I attempted to accomplish something similar a number of times, but always lacked the time (and probably expertise) to finish it, so I'm happy to see this done by someone. Cheers!

By mlk at Wed, 2007-06-20 22:51 | login or register to post comments

re: DSL

One thing that intrigues me about using Katahdin for DSLs is the possibility of writing code in your DSL as a Katadhin program, then adding in descriptions of the language "above" the DSL piece-by-piece.

You'd end up with the code "reading" from the bottom: the high-level logic described by the DSL, then the somewhat-lower-level constructs (main entities) above it, and the even-lower-level support (operators, syntax) above that.

I wonder what assistance Katadhin can actively provide during the process of writing that inverted pyramid? It'd be interesting if the Katadhin debugger could a) be extended to support the generated parser in a Katadhin program and b) could be made interactive with the development of the program itself.

Also, I'd be interested in the possibility of "exporting" the parser at a particular stage of a program (like freezing an environment), in a form that could be quickly bootstrapped into another copy of the interpreter. That would help avoid the overhead of re-interpreting the parser code for a given module. Bonus for merging two exported parsers into a single one!

By Wolf Logan at Thu, 2007-06-21 19:34 | login or register to post comments

Limiting the scope of DSLs

I'd like to see something similar to namespaces for limiting the scope where the parser extensions are active. What I was trying to accomplish with my attempts of implementing a reflexive parser / compiler is something like this (assuming host language is based on Python):

from languages import sql, cpp

get_items = sql.lambda parent: SELECT value FROM items WHERE parent=$parent

cpp:    
    int sum(int a, int b){ return(a+b) };

reduce(sum, get_items(a))

Obviously there's a gap in typing, so this example wouldn't work, but it should give you an idea, how different languages could be used together but still be somewhat isolated.

By mlk at Fri, 2007-06-22 00:17 | login or register to post comments

That puts me in mind of

That puts me in mind of Microsoft's DLR, although that is of course a very different kind of system. The DLR allows this kind of mixing at the member level (that is, an object can be defined in one language, have members defined on it in other languages, and be instantiated from yet a different language).

Now I'm thinking about the possibility of a DLR client language that acts as a meta-script to other DLR-based languages, taking marked code blocks and passing them out to other languages as needed.

By Wolf Logan at Fri, 2007-06-22 00:52 | login or register to post comments

DLR

DLR talks about mixing scripting languages, but it's really only a layer for sharing data and code. Each language still has a separate compiler, developed using traditional techniques. The languages cannot be modifed by the user.

By chrisseaton at Fri, 2007-06-22 09:08 | login or register to post comments

Katahdin does that

Katahdin does that with language statements. With the Python language module you can either write a program entirely in Python, or you can write just a part in Python, just as you describe. This program is written partly in FORTRAN and partly in Python:

import "fortran.kat"; import "python.kat";


fortran {

  SUBROUTINE RANDOM(SEED, RANDX)

  ...

  END

}
python {

  seed = 128

  randx = 0

  for n in range(5):

    RANDOM(seed, randx)

    print randx

}

Simpler languages like SQL are more seamlessly implemented. For example, with the SQLite module you can connect to a SQLite database and then query using SQL inline:

database = new Mono.Data.SqliteClient.SqliteConnection(...); database.Open(); films = database? select director from films where title = "Indiana Jones and the Last Crusade";

This is all code that Katahdin can run today! See my thesis, pages 9 and 45 for these examples.

By chrisseaton at Fri, 2007-06-22 09:06 | login or register to post comments

I'll read the thesis, but here's one question

I can see that the host language determines where the block of the other language ends, so it's not a full context switch, but rather a cooperation. I wonder if it is capable of nesting imported languages, for example if you were to embed some fortran code into a python function, would the boundaries of it be determined by indentation, like in python?

By mlk at Fri, 2007-06-22 19:15 | login or register to post comments

Pass-by-name is not Python

In this example, the FORTRAN code modifies the seed variable passed to it. This is normal in FORTRAN, but impossible in Python. This means that the language in the "python {...}" block is not Python.

This is important. In Python, if you pass a variable to a function, you know that its value won't change. (If it's an object reference, the contents of that object may change; but the variable won't refer to a different object.) Is it possible for Katahdin's Python support to make that guarantee?

By John Stracke at Tue, 2007-06-26 18:41 | login or register to post comments

Pass-by-name is not Python

It's true that python passes arguments by value, but this does not mean that a function can't affect the namespace of it's caller. Python's execution model includes the ability for code to gain access to the stack, traverse it, and modify the objects stored along it, including code and namespaces.

In particular, "if you pass a variable to a function, you *probably* know that its value won't change." There's plenty of Python code in the wild that's happy to (ab)use these features. This is one of the penalties paid for being dynamic and expressive.

I think it's impressive that passing arguments across the two, very different languages "just works" in a way that is fairly intuitive. If crossing that interface required special handling, the utility of mixing them so freely would be lost.

By J Meier at Wed, 2007-06-27 02:50 | login or register to post comments

It's the same language.

By Derek Elkins at Wed, 2007-06-27 04:54 | login or register to post comments

Didn't know that

Python's execution model includes the ability for code to gain access to the stack

Ugh. I like Python, and most of its metaprogramming is useful; but this just pegged my bogometer. It's reasonable to define f() such that f(x) can modify x; it is not reasonable for f(x) to modify y. That just breaks any chance of real modularity.

By John Stracke at Wed, 2007-06-27 14:15 | login or register to post comments

I don't get how...

...one can use something before defining it. By definition, since a construct does not yet exist in a program, there is no code using that construct. Therefore, how is it possible not to stop a program and define a new construct and use it at the same time?

By Achilleas Margaritis at Thu, 2007-06-21 11:47 | login or register to post comments

No need to stop the parser

Obviously you can't use something before it is defined. New constructs must be defined in the program source code before they can be used, but it can be just the previous line in the same file if you want.

Katahdin lexes, parses and executes in one pass. When you define a new construct, it is added to the grammar. The parser continues from the end of the definition, now using the modified grammar, and is able to recognise the new construct immediately.

Katahdin doesn't use a parser generator like most programming languages. The parser is just-in-time compiled instead, so we are free to recompile the parser whenever we want, including as it is running, just as Java is able to recompile (re-jit) a running program.

By chrisseaton at Thu, 2007-06-21 12:00 | login or register to post comments

Multi-pass parsing?

Obviously you can't use something before it is defined.

What about multi-pass parsing?

By cdiggins at Sat, 2007-06-23 07:13 | login or register to post comments

Types?

My immediate impresiion is that one cannot define a statically (or even just strongly) typed PL in Katahdin, so "universal language interpreter" should be qualified somewhat (universal typeless language interpreter?).
On the other hand, strong typing in interpreters may be a moot point...

An unrelated observation is that the shorter the name of the PL (phonetically), the more popular it becomes (compare C/Lisp to Oberon/Modula, but even the latter two try not to put two consonants together) :)

By Andris Birkmanis at Wed, 2007-06-27 02:47 | login or register to post comments

Typed languages

Not at all! I can define a method that is executed for each language construct before the program runs to check types. For example, for the add operator this method would check that the two operands had compatible types and return the type. For variable references the method would look up the declared type in a table. For simple integer or string expressions, the relevant types would be returned. Those methods would be recursively called before the program executes to verify the static typing. The program may be executed by the dynamically typed interpreter, but that wouldn't matter because the types would be verified statically. It would be entirely possibly to implement C-style static types.

By chrisseaton at Thu, 2007-06-28 22:39 | login or register to post comments

Some more datapoints

There are a few interpreted languages out there that allow you to access the remainder of the program as a text stream and parse it however you want. Some that come to my mind are Forth, TeX (maybe also MetaFont and MetaPost) and Postscript. This looks a little more structured however, nice.

By sigfpe at Fri, 2007-06-29 00:00 | login or register to post comments

I believe REBOL has a

I believe REBOL has a similar ability as well, but I may be mistaken - it has been a long time since I've looked at it.

By BigZaphod at Fri, 2007-06-29 00:34 | login or register to post comments

Lambda the Ultimate

User login

Navigation