A persistent AST for multi-purpose use? (Loyc)

I have been hoping to build a full-featured PL for a long time now--since about 2000. I wanted to make a language that was better than all the other languages I had used, because they all had deficiencies that I found irritating to no end. I had grown up on mainstream languages like BASIC, VB, C, C++, Java and C#, so it'll surprise no one at LtU that I was disappointed in my PLs. After much study, I realized I would not be able give a language all the features anyone might want or need--heck, I could barely grasp functional programming, or LISP or Prolog, even though I recognized that these languages were useful and implemented important ideas.

So, I decided my goal should be to make an extensible language into which anyone could add new features. Later, I refined this idea and decided I would like a multi-syntax compiler that mimics popular languages; initial supported syntaxes would be C# and boo. This is important because I wish to target my compiler at a mainstream audience, and I figure the easiest way to convince developers to switch compilers is by compiling a superset of their existing language. Does anybody think such an unwieldy language as C++ would have been a success if it had not been compatible with C?

I chose the .NET framework because it's multi-language friendly, with performance-enhancing features Java lacks (e.g. value types, generics) and so that I wouldn't have to write my own runtime or libraries. I'm just one guy, after all!

My job, then, is not primarily to create a programming language with new features, but rather to come up with a powerful interface by which anyone can come along and add features to the compiler. The form of an extension will be a DLLs that somehow cooperates with Loyc and all the other extension DLLs.

The project is called Loyc: Language of your choice. The initial implementation will be in C#.

Right now, I'm hung up on its design. In particular, the AST. I suppose the biggest problem is that I'd like to support a rich IDE for code editing. Among other things, such an IDE should allow

- Code refactoring tools that may make arbitrary changes to the AST, which are propagated back to the source code without disturbing comments or spacing in code that is not changed
- Analysis of code to produce dependency graphs, call graphs, and other information in real time
- Most ambitiously, I would eventually like to be able to run code "at design time", though I haven't developed this idea very much

I have some ideas for propagating AST changes back to source code, but the problem I'm hung up on is that there will not be only one AST, but a sequence of them produced by incremental changes. In order to run code, or to fully analyze it, it will be necessary to transform the AST from more "advanced" to more "simple" structures. For example, the .NET framework has no direct support for closures. Therefore, they must be eliminated before the code can run. If the AST contains a method like this:

Action<int,int> foo() {
int x = 0;
return delegate(int y) { return x += y; }
}

it must be transformed to something like

class foo_shared_data {
int x = 0;
int anonymous_fn(int y) { return x += y; }
};

Action<int,int> foo()
{
var locals = new foo_shared_data();
return locals.anonymous_fn;
}

I wish to perform this transformation without losing the original tree, but, to improve performance and decrease memory use in most cases, I want to avoid making a complete copy of the tree. If possible, I would like the public interface to give the impression of incremental mutative changes, as is conventional in imperative languages. Basically, I would like a kind of persistent tree data structure where "checkpoints" can be taken to preserve a snapshot of the tree, without actually copying it. I expect the number of checkpoints to be small, though it would be nice if a large number of checkpoints could be managed efficiently without an explosion in memory use. Can someone suggest algorithms and data structures that would help?

Comment viewing options

Write in a functional style?

Perhaps I'm not understanding something, but I don't think you really need a specific algorithm or anything. Just write your AST data structures in a functional, non-mutating style. The "purely functional" page on Wikipedia would be a nice place to start. The papers referenced in the external links would be the next place to go.