Code Generation with Python, Cog, and Nant

We've been using C# for a couple of years now, and are getting tired of the verbosity. Especially tired of copy/pasting and changing a couple of identifiers, and I imagine many other people are, too. After seeing some of the macro capabilities of Lisp, we got jealous. After some googling and browsing, I ran across Ned Batchelder's python-based code generation tool, Cog.

A nice description of using coge generation in real life. Might help explain the idea to programmers unfamiliar with the technique.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Other Code Generation solutions

There are dozens of programs out there with a similar concept. My Generation is one of the better, giving you a range of languages to implement in.

Others are Codagen, CodeSmith, Jostraca, Workstate Codify...

You can write your own in a few days using Scheme, Lisp, Erlang, JavaScript, Oz... and of course Python as above.

Sure

You can write your own in a few days using Scheme, Lisp, Erlang, JavaScript, Oz... and of course Python as above.

Obviously. The hard part is understanding why such techniques can be useful, and when. Once you get that, the rest is easy.

But what percentage of programmers "get" code generation?

But what percentage...

Small, very small...

But when you do get it

and you are using good languages, like MetaOCaml, one can do some fairly cool things, even if I say so myself.

[That is a link to an unpublished but submitted paper of mine. Read at your unrefereed-risk]

That's very cool. :)

That's very cool. :)

weird

I would think that out of the many various esoteric disciplines and ideas found in programming that the benefits of code generation would be relatively clear cut. The 'when' is I suppose always the sign of experience and craft with any thing that is reasonably complicated.

Code generation is strong medicine

But what percentage of programmers "get" code generation?

Of those that do, how many realize you should avoid it if at all possible. ;-)

My position with this is the same as with all metaprogramming ( macros, reflection, etc ): try to solve the problem within the language first, most likely by refactoring your design. Nine times out of ten or better, this gets rid of "cut and paste" code.

If you do use these techniques, because the problem is otherwise insoluble in the language, know that you have significantly increased the complexity of the program.

An interesting challenge for language design is to figure out what capacities could be added to the language to eliminate the need to "leave the language" to solve such problems.

(OR Mapping between the type system of the language and the type system of an RDBMS leaps to mind as such a problem.)

avoid it if at all possible!!

Strong words indeed.

One can indeed produce the same function with a well factored program. Code generation does have one undeniable advantage, and that is moving work from run time to compile time.

I use code generation extensively, and find the resultant simplicity and productivity gains (400 DSL lines produces 25,000 lines of working production code) compelling.

Depends on the problems

I use code generation extensively, and find the resultant simplicity and productivity gains (400 DSL lines produces 25,000 lines of working production code) compelling.

I'm not familiar with your development circumstances, and I acknowledge that without that info I can't assess the trade offs of code gen vs. other solutions.

Your statement has the premise that generating lines of code is the primary problem to be solved in software development.

Whereas, in any project I've been involved with, the bigger problem is understanding and reasoning about the code already generated, since even on a brand new system you are likely to spend more time doing that than writing code.

With code generation, let's say you run into a bug with some but not all of the code segments that have been generated by a certain template. It is much easier to read the actual code and fix the bugs, than to figure out the general rule that needs to be fixed in your generator.

Also, when a newbie to your project (or even you 6 months after you last looked at that code ;-) ) looks at the REAL (as in can be modified normally) code (the generator), he has to reason about a whole class of behaviour at the same time.

This works great if your template has consistent semantic properties, but if it is really only a syntactic hack, the semantics demonstrated by each piece of code may NOT be the same.

Maybe you have nailed these problems (or they don't apply in your situation) but that certainly isn't the general case.

Advanced programming

I think the argument you are making really applies to advanced programming techniques generally. The more abstract, the harder it is for a newbie (or as you say, even the original programmer later on) to understand. At least with meta-programming, it's easy to see what is being generated at a concrete level, whereas with other approaches the abstract view is all you get. One could stick to a "keep it simple" programming style, but then there's the risk that a conceptually simple bug/enhancement might require changing the source in 500 different spots ...

Simple, not stupid ;-)

One could stick to a "keep it simple" programming style, but then there's the risk that a conceptually simple bug/enhancement might require changing the source in 500 different spots ...

I'm not arguing that abstractions should be avoided, but rather that abstractions which escape the "normal definition" of the language should be avoided.

Shotgun surgery is a symptom of bad design in any non-trivial language. There are usually numerous abstraction mechanisms within a language that can solve the problem.

You don't necessarily need to pull out metaprogramming to solve it. The extra level of (possibly confusing) semantic reasoning will likely not be worth it.

Code generation is strong medicine

As with all concepts in programming language design, it's not so much important whether you have a feature or not, but whether it works well and is well integrated with the rest of the big picture. Maybe you just haven't found the right code generation approach yet (but I am just guessing, of course).

This is a usual mistake made in such discussions: OOP isn't good as such, but certain OOP languages work well. FP isn't good as such, but certain FP languages work well. Static typing isn't bad as such, but some statically typed languages suck. Code generation isn't bad as such, but some code generation approaches are messy.

It's not the features, it's something else that counts. (The "quality without a name"?)

Eff the ineffable

It's not the features, it's something else that counts. (The "quality without a name"?)

That doesn't give a language designer much to go on, though. ;-)

Code Generation

I think of code generation as:

* Building a Domain Specific Language
* Moving work from Run-Time to Compile-Time (Partial Evaluation if you like)

So from a language designer's point of view, facilitate these.

Never met a meta...

So from a language designer's point of view, facilitate these.

The challenge I was proposing was to avoid having to use these at all by enriching the PL itself.

Building a Domain Specific Language

OK, so you're a language designer and you have spent a lot of time defining your PL so that it has a well defined syntax and semantics.

This is great: programmers everywhere can read code written in your language and know what it does. When bugs or requirements changes come, someone can see what needs to be changed to modify the behaviour.

All of a sudden, Joe Programmer says, hey, I can solve my current task with a DSL or some other kind of metaprogramming.

And now, anyone who wants to understand what the program is doing must learn a SECOND language, possibly poorly thought out, or badly documented, or that subverts some aspect of the compile time checks or unit tests.

An example of this is the mania in the Java world for using some kludgy XML format as a "configuration file" or "runtime script". when some kind of configuration class would have done a better job, and would have required no new learning for a Java programmer.

So my whole premise is that having to resort to such measures suggests a problem or limitation with the language, or with the "best practices" being used in it.

Moving work from Run-Time to Compile-Time (Partial Evaluation if you like)

Hmm. It depends what particular examples you have in mind. The cases where I've used this approach have been where the choice was between using reflection or code generation, e.g. OR mapping.

The idea is supposed to be that generated code at least has the benefit of compile-time type checking.

My point of view is that they are just two different flavours of poison, that may cure some problem, but that leave you wanting medecine with fewer side effects.

but when a language is sick...

While I certainly agree that code generation is strong medicine, so many languages of today are sick in one way or another, that to resort to strong medicine is the only way to remain somewhat productive!

Some languages do have very nice abstraction features - I love them and use them all the time. Unfortunately, I am forced to sometimes pay an extremely high run-time penalty for my nice code. This is unacceptable. Since compiler writers still seem to cling to the separate compilation red herring, they force me to turn to code generation instead. By inverting the 3 Futamura projections of partial evaluation, a (good) code generator can easily be seen as user-written compiler extensions. And sometimes, even the compiler writers blur this line too: any language that generates C (or C--) in its back-end is part compiler, part code generator.

When abstractions like 'classes' and 'modules' will completely disappear from the object code generated by your favourite language, then it will be time to seriously re-examine the code generation issue. Until then, I will always prefer code generation over copy-paste coding!

Time for the doctor

so many languages of today are sick in one way or another, that to resort to strong medicine is the only way to remain somewhat productive!

If you gotta, you gotta. I'm interested to know though how the sick languages could be made healthy, i.e. what features or design choices would solve their problems.

Unfortunately, I am forced to sometimes pay an extremely high run-time penalty for my nice code

Are you thinking of a scenario such as the old C trick of inlining with a macro a function that is too small/too frequently called to be worth the function call overhead?

It is an unfortunate truth of programming that sometimes a simple, elegant, general solution does not perform well enough, and a particular, more complex solution must be tried instead. So be it.

But I'd probably try the nice solution first anyway to be sure that its limitations were really a problem.

Until then, I will always prefer code generation over copy-paste coding!

If those were my only two options, I would too. I'd want to try really hard to make sure I didn't have better options first, though.

[small typo fixed]

The pseudo-doctor is in session

I'm interested to know though how the sick languages could be made healthy, i.e. what features or design choices would solve their problems.

There are so many ways... languages from different paradigms need different 'fixes', naturally. In imperative languages say, right now one has access to only assignment - why not also allow static single assignment? In general, one can (and should?) add 'redundant' features that however have much stronger invariants associated with them than native language features.

In OO languages, you can make a different between overriding a method with a new one (same signature, possibly different behaviour) and augmenting a method. Any syntactic method that can strengthen invariants (co-algebraic for OO, algebraic for functional, etc) is worthwhile.

Even though while loops are sufficient, for loops are better, foreach loops are better still, map/fold better still. Why? If nothing else, stronger invariants are true for each of them. More statically true knowledge always allows more optimization opportunities.

Are you thinking of a scenario such as the old C trick of inlining with a macro a function that is too small/too frequently called to be worth the function call overhead?

Definitely. The question is: why aren't standard compilers smart enough to do this on their own? At least for common cases? gcc is getting better at this, and more advanced compilers too, but Java is going completely backwards on this. Ocaml can do some nice inlining of functions, but stops at module boundaries -- which makes using its functors quite expensive.

It is an unfortunate truth of programming that sometimes a simple, elegant, general solution does not perform well enough, and a particular, more complex solution must be tried instead. So be it.

I refuse to accept that. I want the simple, elegant, general solution to perform well! What I want is a language in which I am able to help the compiler along so that it can see how to optimise my code properly. But if the compiler won't let me help it, what am I to do? Design a language in which I can of course ;-) (There is nothing much to see yet, but I am working on the language for MathScheme
which will allow this).

The more specific the optimization, the less general

I refuse to accept that. I want the simple, elegant, general solution to perform well!

And I would like P=NP. Unfortunately, we may both be disappointed. ;-)

As I mentioned in another thread, algorithms often have to trade off between generality and performance. In math in general, there is often a trade off between generality and provability, or ease of proof.

I suspect that at least some cases of abstractional generality will turn out to be like this.

What I want is a language in which I am able to help the compiler along so that it can see how to optimise my code properly.

I've expressed my general feeling about this in another thread

To add to that, consider a cross-platform application (a type of generality, no?). You add a bunch of optimization hints to your code. These are based on your knowledge of the guts of a particular compiler on a particular platform.

Suddenly, you want to compile on a different platform. The hints are no longer relevant, but the compiler has no way to know this. Why not? Because it would have to know how the other platform worked and what you intended at the time to make a substitution, or choose to ignore it.

Or all the compilers on all the platforms could just ignore most of hints, in which case you wasted a lot of typing and dirtied up your source code. ;-)

Now if, contra this contention, there were a general substition that COULD be made mechanically, without this external knowledge and human-like judgement, then you could just augment the optimizer for such a mechanical substitution. In which case, you don't need the hints anymore.

So you have either hobbled your program on platforms other than the one you used, wasted your time with the annotations, or you can choose to add MORE annotations and descend into #IFDEF hell, making sure that any platform you might port to gets its proper optimizations.

How badly do you need that performance again? ;-)

General versus specific

I actually agree with most of your comments. What I want to explore is how far one can use specialization to regain efficiency in abstract algorithms. In the situations you point out, I agree that little can be done.

But in some mathematical applications (see the link to the paper on Gaussian Elimination in a previous comment), one can program very abstractly and still get extremely good performance. Other applications (Boost, Blitz++ amongst others) have shown similar results. So while in general this is unsolvable, it may be that in practice, one can still get pretty far indeed.

Lisp?

I refuse to accept that. I want the simple, elegant, general solution to perform well! What I want is a language in which I am able to help the compiler along so that it can see how to optimise my code properly.

Isn't this one of Lisp's great claims to fame? That you can write a simple, elegant general solution and that you can go back and use macros and type-annotation to make it perform well.

Abstraction, yet again

It all depends on the abstraction facilities provided by your language. That's what I was getting at, if it wasn't clear.

MetaBorg

For embedding object languages in a host language (and of course, these two can be the same language), MetaBorg can be a usefull approach. It embeds the object syntax in the host language, so you can (e.g.) generate object code using concrete syntax. In many cases, this does not only mean less source code, but also that the code is much easier to write and read. In addition, the generated object code is guaranteed to be syntactically valid (compare this to printing strings).

It is built on SDF and the Stratego language, which also means your environment will get more complex with these tools added (but that's just as true for adding some Python system).

I don't think this was mentioned here before.