Compile-Time Execution in an Object Oriented Language

I'm designing a generic object oriented language, and I would like to replace all of those silly declaration modifiers that you often see (e.g. public, protected, private, internal, const, static, etc.) with annotations. My plan is to introduce a preprocess phase, executed immediately after compilation, that allows the program to verify that the semantics associated with the various annotations are respected.

This phase would be simply a call to some "preprocess" function from the compiler, which uses the langauge's own reflection facilities to allow the programmer to walk the code and verify things. This seems relatively straightforward, but I am concerned that there are well-known pitfalls that in my infinite naivete I may be overlooking. There is of course the really obvious stuff like the possibility of a non-terminating or intractably complex algorithm at compile-time, but that I can live with. Thanks in advance for your suggestions.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What sort of annotations

Many OO languages already verify (at compile time) that properties like const/final, access control, etc. are respected. Some let you get around these if you try (C++), others don't (Java). (And of course JVMs do this on the compiled bytecode as well, albeit at runtime).

What sort of annotations did you have in mind? Arbitrary predicates on terms? Certain predicates or properties are relatively easy to verify; OTOH verifying that a given function terminates, or verifying that an arbitrary runtime invariant holds, are both undecideable.

And not to be rude at all; but why are declarations like public/private "silly"? Some of the security/encapsulation models that OO languages have may be baroque (or easily defeated, or both), but they do what they intend to do. Are you thinking of replacing ad-hoc tags with a more generic (and elegant) mechanism?

One last thought... maybe these annotations, of whatever sort, can be integrated into the type system somehow? They always seem to be "bolted on"....

Adhoc Tags vs Adhoc Keywords

Many OO languages already verify (at compile time) that properties like const/final, access control, etc. are respected.

I believe that it would simplify both the language specification and the compiler, as well as giving the language more flexibility. I find it tempting to add new function declaration modifiers to the language, when they all do more or less the same thing: they ask the compiler to throw an error if the programmer violates some simple predefined semantic rule.

What sort of annotations did you have in mind? Arbitrary predicates on terms?

I'm only thinking of simple tagging for now.

OTOH verifying that a given function terminates, or verifying that an arbitrary runtime invariant holds, are both undecideable.

Yes, certainly.

And not to be rude at all; but why are declarations like public/private "silly"?

As you say they are both baroque and easily defeated. When I imagine the alternative: a programmer defined semantic verifier, the current system starts to seem very primitive. I look at it this way: in C++ why are there are only three levels of visibility declaration modifiers (public, protected, private)? What if I want more?

Are you thinking of replacing ad-hoc tags with a more generic (and elegant) mechanism?

It is tempting for language designers to add new keywords (e.g. public, static, const, internal, abstract) to the language, when they all do more or less the same thing: they ask the compiler to throw an error if the programmer violates some simple predefined semantic rule. However I do think that ad-hoc tags are more generic than using fixed ad-hoc keywords. For most purposes a programmer can turn to a standard library, if they want some kind of standard predefined semantics for these tags.

One last thought... maybe these annotations, of whatever sort, can be integrated into the type system somehow? They always seem to be "bolted on"....

That sounds very appealing, a sufficiently powerful type checker can be used as a verifier like I describe. I'm wondering what it would buy us though compared to something more mundane like a self-executing verifier in the code. Any thoughts on how a better integration with the type system could be helpful?

Active Libraries, which is

Active Libraries, which is probably more than you need, and Meta-Compilation of Language Abstractions, which sounds like the right fit for what you have in mind.

Great links

Thanks for pointing these out to me. They both discuss techniques that are much more powerful than what I need. I am looking at only verification, and not code generation at compile-time. These links will probably help to provide some ideas and arguments for doing what I am doing. Of course, it is probably only a matter of time before I start wanting to support multi-stage computation.

I think the Meta-Compilation

I think the Meta-Compilation thesis is closest to what you want. It allows a "library" access to the AST of code that uses it, so you can inspect and transform it at will. You're only interested in inspection at the moment it seems, but as you say, it'd be nice to have an extensible mechanism which allows some growth as well.

Not meta-compilation, but initialization

In my language, Virgil, while there is no "meta-level", meaning the program cannot access the representation of itself or its types, it can however do Turing-complete computation. There is, in fact, no separation of the runtime language and the compile-time language. The compiler has a built-in interpreter that can run any arbitrary initialization code.

This comes in handy for embedded systems like microcontrollers where any kind of dynamic memory allocation is problematic: by allocating everything at compile time and initializing it, the compiler can analyze and optimize the heap and code together into a very tiny binary. Virgil can run on microcontrollers with as little as a few bytes of RAM.

The Virgil Webpage.

I am working on a new version of the language and am building the compiler in the language itself. I am using a modification of the older compiler's frontend to bootstrap using the built-in interpreter. Soon the new compiler will be able to compile itself to Java bytecodes and then there is no stopping :)

Virgil was discussed on LTU

Virgil was discussed on LTU before. Great to see progress!

Virgil looks very

Virgil looks very interesting. Was motivated the design choice to not support meta-programming and reflection?

A language for systems people, not languages people

The main reason was that I didn't want to pollute the language with too many built-in concepts that come with a cost (e.g. built-in intrinsic classes, types, values, system routines, etc), both in terms of intellectual complexity and space in the final compiled program. This means that the programmer gets a "clean slate" from which to build everything from scratch, from the ground up.

That of course makes exposing the representation of the program a bit difficult, since in order to have values and types that the program itself can manipulate, the compiler must map its internal representation onto types in the language.

There are a couple more reasons I am against the idea (at least for my language):

i.) Exposing the AST exposes a particular compiler implementation. While this can be standardized, it both reduces the compiler implementor's freedom and increases his/her implementation burden. They not only have to parse/typecheck correctly, but they have to produce a data structure with an _exact_ representation of the code according to some spec.

ii.) Most programmers are either not competent enough to be trusted, or are simply not geared to that mindset. Automatically transforming the program on a metalevel opens up not only the gates of heaven, but the sheer pits of hell. As a potential user of other people's code, I definitely _do not_ want to debug their meta-whiz-bang transformation along with the original program it is operating on.

iii.) The final program might be extremely hard to reason about and/or debug.

Overall, and here's where I quibble with the abstract idea you are talking about here, I don't think that the costs in both implementation and intelllectual complexity are really worth the potential benefit of meta-compilation and processing.

Compilation and program transformation aren't child's play, so at least in my case I was careful not to give out too much rope. For the version of Virgil I did for my thesis work, meta-compilation was out of scope; however, even now that I am trying to make it more general, I don't plan to wander into that territory.

Static?

I'm designing a generic object oriented language, and I would like to replace all of those silly declaration modifiers that you often see (e.g. public, protected, private, internal, const, static, etc.) with annotations. My plan is to introduce a preprocess phase, executed immediately after compilation, that allows the program to verify that the semantics associated with the various annotations are respected.

By static, do you intend the C usage of 'file local'? Are you just interesting in providing a scheme for generic access modifiers or are you envisioning this to apply to the other usages of static as well? (If so, I'm not sure how you'd do this as a post-compilation step)

On Static

By static, do you intend the C usage of 'file local'?

No. I was thinking of the C# (and I believe Java?) meaning: a class method. One approach to having static methods without explicit declaration was used in Delphi: a null "this" pointer was passed to a method when called without an object instance (e.g. Class.quux() instead of Object.quux()). Access of the "this" pointer in a static method was then a run-time error.

So far I believe that I can use a compile-time verification step to manually verify the semantics of the following C# keywords (sorry to not use Java as an example but it is not fresh in my mind and I risk making mistakes)

public / protected / private / internal / abstract / virtual / override / static

There are other additions I can imagine such as "readonly".

Static members?

I can see how you'd use that trick for static methods, but not static members. Static members affect code generation and are not just something to be verified (e.g. put the modifier on a variable that's incremented in a class constructor). But maybe you intend to break from tradition and give static a single meaning in your language?

Singletons

A very clean alternative to Java style statics is to have language level singletons. Such singletons work just like any other objects (they can implement interfaces, have an implicit this/self reference, or whatever else makes sense in your language). The difference is that that there's always exactly one instance of the type and, by implication, the definition of the type can't have constructor parameters. This might tie into your tagging concept well - the tag can generate the necessary scaffolding of instantiating the type and giving the instance the same name as the type (assuming objects and types live in different name spaces).

Why Tags?

I should point out that there is a secondary purpose of using the tagging mechanism that I describe: as a programmer I want to be able to group methods according to purpose across classes so that I can define cross-cutting concerns (as in aspect oriented programming).

For example there may be a set of functions in various classes that I want to appear in a log or trace. It would be nice to simply add a tag (e.g. "trace") to these methods. The reflection API then could be used during run-time debug builds.

Java annotations

This sounds fairly similar to Java annotations. Of course, in Java annotations were a post hoc feature, so they're a kind of separate addition to the built-in annotations (public/private/static etc.), and the orientation is towards supporting third party tools. Still, it seems relevant.

Similarly for C#

Similarly for C#. Experience suggests that C# probably did a cleaner job than Java.

Details?

Hi Derek,

Would you mind providing some exposition on your opinion? In general, I'm fairly fond of the general notion that you can use "annotations" and "annotation processors",1 to implement a form of pluggable type system, so I'd love to hear any insight you have here.


1. Where "annotation processor" means something that has access to the program AST like Sun's Tree API