Code Generation Network

It's been quite a while since I visited codegeneration.org, and it seems like the site grew considerably, so you might want to check it out again too.

Code generation is an important programming technique (not to be confused with the code generation phase of compilers), which I am sure everyone here is familiar with. It seems to me that the percentage of programmers who know about code generation is relatively small. Am I right in this assumption? I am not asking about people actually using the technique, mind you, just about knowing that it exists and what it means, and don't think the basic idea is "strange" or involves dark magic.

I wonder where, if anywhere, should programmers (and CS students) learn about it. And no, the answer well, on LtU of course isn't a good option!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

"The Practice of Programming"

Chapter 9 of "The Practice of Programming" by Brian W. Kernighan and Rob Pike on "Notation" talks about "Programs that Write Programs"
It has practical examples including network packet parsing and a simple regular expressions implementation.
http://cm.bell-labs.com/cm/cs/tpop/

Their earlier classic "The Unix Programming Environment" also has a chapter on yacc.

These books are actually read by programmers in the field so I would guess most people who program on Unix are familiar with the idea.

Good

I like these guys, so I am glad they are being read. I fear you are a bit too optimistic about just how many programmers really read about programming when they get home from work...

Aside from hoping they read is there anything the education system should be doing?

(BTW, people should also read Jon Bentley. He belongs to the same Bell Labs gang...)

Personal experience

Just some notes from my personal experience studying CS.

At the end of the second CS year at the VU we had a 'software engineering' practical where the focus was on modelling, documentation, and implementation. The end-product had to be some sort of a graphical plugin for the Eclipse framework. Because of the required modelling phase (UML), almost every team used code generation in the next step using Eclipse's EMF (I can't recall whether this was dictated). But what happened next was that people started messing around with the generated code making regenerating it practically impossible. All I remember is that you would not want to maintain any of the resulting plugins :)

This year I followed the course on Program Transformation at the UU. Obviously this involved writing code generation tools. The Stratego team at the UU is doing nice things with code generation, one of the more developed applications is the Meta Borg method for embedding DSLs (e.g. generating Swing code from embedded UI descriptions).

Now some comments on the opening post.

I don't think the percentage of programmers who know about code generation is that small. Some of the tools listed at the site are pretty wide-spread (O/R mappers, 'template engines', JavaDoc & CO, ...).

Also, I really don't see much difference between 'code generation' and 'code generation' conceptually. The difference seems to come down to the abstraction level of the resulting code (low for a compiler, higher for, say, EMF). But the input is usually just one level of abstraction higher then the output and in the end it just comes down to this translation. Or am I missing some magic in 'code generation' here?

I am glad to learn the I was

I am glad to learn the I was too pessimistic.

Code generation as a last refuge

I don't think you should encourage people to learn code generation techniques. Certainly not if it involves generating source code files which have to be compiled later. If you really need code generation your programming language simply doesn't have enough abstraction power.

If you really need code

If you really need code generation your programming language simply doesn't have enough abstraction power.

Sure.

I don't think you should encourage people to learn code generation techniques.

Learning isn't the same as "using". Knowledge is power, bla bla bla...

True

From personal experience I never had to resort to code generation in Python, but it often comes handy in Java.

I tend to use it when I am obliged to use a framework (EJB anyone?) that requires a lot of dumb code (getters/setters).

I think it is also related to the introspesction/reflection/dynamicity of the language.

Anyway, it could be a new metric for defining the abstraction power of the language :)

More on generated code in Java

Yup. One thing about generating parsers using JavaCC is that once they're generated you can use all the usual Java static typing stuff. It'd be possible to write code to read in a grammar and generate the parser and such at runtime, but with that much runtime dynamic action going on one might as well use Ruby.

If you really need code

If you really need code generation your programming language simply doesn't have enough abstraction power.

In that case all I have to do is convince the entire company to move to a different language. Or at least let me program in a language that is powerful enough. Unfortunately I don't think that will happen any time soon.

What sorts of languages where you thinking about? Macros (a la Lisp) would, in my opinion, fall under the umbrella of code generators, just much cleaner and less error prone than the typical generator.

Once we start defining these things the distinction between code generators, interpreters, compilers, etc. tends to blur. I'm sure these techniques really lie on a continuum; it's simply a matter of which place is the best place to be for a particular project. This is demonstrated in a quick-n-dirty (and probably very wrong) diagram:

   rigid and static                                    flexible and dynamic
<--passive generators---active generators--macros--compilers---interpreters-->

I'm guessing that the cleanest option for many projects is to build a DSL in the form of an interpreter, but sometimes that might be overkill.

Note: When I say “rigid and static” and “flexible and dynamic” I'm not talking about the type system, but about the ability of the system to change. Passive generators are run once; subsequent changes get made by hand. Interpreters, on the other hand, have the ability to dynamically generate, modify, and change their own code at runtime.

Does any of that make sense?

CAMLP4, anyone?

To blur those distinctions perhaps more, there's camlp4, where you can take the existing ML grammar and plug in new productions — or start from scratch — and generate an abstract syntax tree using convenient (if slightly ugly-looking compared to Lisp/Scheme) quotations. So, it's kind of like a macro system, except you can parse more or less arbitrary text with it; so, there's no reason it couldn't be used to parse your DSL of choice and emit OCaml source, which sounds more like a code generator.

One More Time, With Feeling

One-Day Compilers, Or: How I Learned to Stop Worrying and Love Static Metaprogramming, implementing a DSL in O'Caml using camlp4 and generating C. :-)

PDF

The most interesting code generator I've seen recently is in our current system and compiles from PDF standards documents published by ETSI into Erlang protocol stubs.

You Win

I'm sorry, I can't top that. That's genius.

Reminds me of ruby-jdwp

I work with Rich Kilmer; he did lots of work on a Ruby bridge to the Java Debug Wire Protocol. As I recall he generated a ton of Ruby code directly from the JDWP spec, which made for a pretty hefty Ruby file!

Code Generation Network

I took over as editor of the Code Generation Network in March 2006. Jack Herrington, the site's previous editor, did a great job building up the site over the years and I am hoping to continue his success. If anyone is interested in submitting an article for the site then please let me know.

I am also organising a major Code Generation conference for May 2007 in Cambridge, UK. Please visit Code Generation 2007 for more information.

We are currently seeking session proposals (deadline Friday January 12th 2007) and accepted session leaders will get their fees waived for the whole three-day conference.

Hope to see some of you there.

Regards,
Mark Dalgarno