Code generation vs. dynamic/introspective languages

Question from this seemingly eternal newbie: I don't grok things enough yet, but it seems like something in the world of Scrap your Boilerplate / GADTs / etc. type stuff might be able to obviate evil code generation in more static-than-dynamic languages? Or am I just hopelessly befuddling concepts?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

abstraction

code generation serves abstraction. think of high level languages vs assembler, database api vs object-relational mapping, or handwritten parsing vs parser generation.

similarily, scrap your boilerplate abstracts over generic data traversal, gadts abstract over tree-like data structures, just like monads abstract over side effects.

you are right that with the additional expressivity of type systems you can implement many abstractions directly inside the language.

on the other hand, code generation might be more practical in some domains. otherwise we might already be programming in typed assembly languages.

I disagree.

I find code generation extremely valuable, and I would really like if if the languages I have to use at work supported code generation.

I have used code generation many times, in these cases:

1) creating an object model. I have developed a little application which takes an object model description in XML and turns it into code.

2) creating a tool that created a parser which automatically fills in an AST. My tool was fed the grammar description and an AST description with links to grammar rules, and the result code created a parser which created the AST automatically.

3) creating a database schema. Another useful tool that I needed was one that was fed a database description and created the database and the classes to handle the database (this was before Hibernate).

If I had the luxury of processing code in my own way during compile, then all these tools would be automated tasks taking place at compile-time.

red herring?

Your point 1 implies that the "languages you have to use at work" are actually more verbose than XML - which is why you want to define the schema in XML instead of code, in the first place. Maybe it would be more cost-effective to fix the verbosity problem, rather than bolt on even more features for code generation? Ditto for point 2.

Only point 3 seems to be a real problem, but Hibernate seems to have solved it for you.

Fixing the verbosity problem means fixing the language.

Take Java, for example: I can not describe objects as simple records and have the compiler write the get/set methods. Being able to manipulate declarations as I fit at compile time would be a great advantage.

As for Hibernate, it is so huge and cumbersome that it has become a problem by itself. The code generation approach would be much better, because it would allow me to prepare the execution environment without needing to know and use anything else other than the language I am using.

Backwards Mindset

In all cases the better approach is to:

1) Create the object model in code. Add a builder protocol to allow the object model to describe itself to a builder.

Create an XML builder and apply it to the object model.
or:
Create the AST as an object model and use it to write a recursive descent parser that builds itself.
or:
Create a SQL/DDL builder and apply it to the object model. In all cases I have one source of record - my object model.

The less tools I use, the better.

Scattering a project amongst many tools, having to learn and maintain each tool, is not my preferred way of doing things. I think it is much simpler to use the same programming language to manipulate itself.

Nonsense

I think there's no question which choice is easier to maintain. (using HTML/DOM as an example since it's commonly understood)


<h3>Lambda The Ultimate</h3>
<p>Is <b>very</b> nice</p>

versus eg.


Element header = document.createElement("h3");
header.appendChild(document.createTextNode("Lambda The Ultimate"));
document.appendChild(header);
Element paragraph = document.createElement("p");
paragraph.appendChild(document.createTextNode("Is"));
Element bold = document.createElement("b");
bold.appendChild(document.createTextNode("very"));
paragraph.appendChild(bold);
paragraph.appendChild(document.createTextNode("nice"));

It's harder to spot bugs in the "code-DOM" approach (there are at least 3 in the example to demonstrate)

??

Obviously that example is, but that's mostly because of the way you wrote it. This isn't particularly verbose, for example:

body = Body(...,
    H3("Lambda The Ultimate"),
    P("is ", B("very"), " nice")
)

Yes

That's a lot better of course, but you have to define the H3, P, B etc. somewhere. You either define them by code generation (back where we started from), create them at runtime as closures eg. in JavaScript (in this case your programming language is powerful enough to do most things without code generation anyway), or hand-code them (ugh).

The point of the post was to compare DSLs and (D)OM builders in incapable languages like Java (where complex literal syntax is non-existent).

umm

Java has constructors, doesn't it? If so, then it's completely possible to do exactly the same thing as above, albeit with the added verbosity of types.

oops

I forgot about java's abysmal list facilities and lack of varargs. My mistake.

Java 5 has varargs

BTW, Java 5 does have varargs. Though the only way to accomodate both "h3" objects and strings in the same list would be to use a vararg array of type "Object".

Well obviously...

This is partly due to using an absurdly verbose language (Java?), but also due to the fact that HTML/XML carries no associated semantics. Those have to be communicated out-of-band while the code spells out what is allowed and what isn't, i.e. it attaches semantics. (This may not be so apparent with HTML/XML output, but it certainly is with XML input.)

I am aware of DTD and XSD, but they are not enough. The arbiter of what is acceptable is always going to be the (business) logic, so why not simply generate a parser based on that?

Tricks

Way back in '97 Weblogic had a HTML-generation library with a kinda cute approach to this in Java. Example code snippet from their Javadocs:

  HtmlPage hp = new HtmlPage();
  hp.getHead()
    .addElement(new TitleElement("Hello World"))
    .addElement(new LinkHeadElement(LinkHeadElement.relTag,
                                    LinkHeadElement.relCopyright,
                                    "http://www.weblogic.com/copyright.html"));
  hp.getBodyElement()
    .setAttribute(BodyElement.bgColor, HtmlColor.white)
    .setAttribute(BodyElement.textColor, HtmlColor.navy);
  hp.getBody()
    .addElement(MarkupElement.HorizontalRule)
    .addElement(new StringElement("Hello World!").asBoldElement())
    .addElement(MarkupElement.HorizontalRule);
  hp.output();

This is not very concise in comparison to e.g. Lisp but it is impressive to see what tricks people come up with when they get creative with the language at hand. That's really hard to do when you are using a language reluctantly and are reconciled to the idea that it sucks. :-)

I'd assumed that the trick was borrowed from Objective-C but Peter "Practical Lisp & Weblogic Hacker" Seibel tells me it was an independent invention. Small world. :-)

Hey I'm all up on Java this month. I was talking nostaliga with some friends at work and discovered that the first ever paid computer work I did (10 years ago! I'm getting on!) is still online and still runs! It's the arcade game Araknoid if you want to try your skill :-) Now I wish I could find my old Amiga demo programs!

No Conflict

There's no conflict between powerful type systems and code generation--on the contrary; combining powerful type systems and code generation tends to lead to the various efforts in multi-stage programming, of which MetaOCaml is an excellent example.

Peace, man.

Gotchya (again with the reminders from LtU that I need to learn O'Caml :-).

The blog entry pointed out that debugging through layers of generated code is hell, so I was hoping that there would be really great (if not so widely known) ways to get the equivalent power without code generation causing so much undebuggable verbosity.

Macros can be cool, but possibly inscrutible. Libraries can be cool, but possibly too verbose / something of an impedence mismatch. Are functors easy to grok and debug? Are there forms of genericity or parametricity or polymorphicness that are really cool and concise and powerful and easily debugged? I'm utterly unpracticed with such things.

(The blog's thought that debugging is important is what resonated with me. It would be interesting to follow discussion that looked at all of the programming tools from the perspective of ease of debugging. I guess it also depends on the ui of the debugging tools.)

[EDIT: uh, actually, maybe I just read into the blog the idea about debugging because that is my own hobby horse.]

blocks

Ever heard of smalltalk? Blocks are a pretty good replacement for this kind of metaprogramming if you don't mind the speed hit usually associated with them, of course this is really dependent on the language implementation as it's entirely conceivable to be able to use blocks in the same way as macros without the performance hit.

Generation for translation is different

"The point of the post was to compare DSLs and (D)OM builders in incapable languages like Java (where complex literal syntax is non-existent)."

So that's your clue that Java is the wrong tool for the job.

It is also not true that html tags carry no associated semantics. There are definite sets of attributes that vary among tags as well as lineage constraints that can be embodied in code.

"you have to define the H3, P, B etc. somewhere. You either define them by code generation (back where we started from)"

No, that is different. You are tempted to do code generation once as a means of translating from one representation to another. It would have been better to have started in code, but you didn't and it would be pointless not to make use of automation to convert your text specification into code. But from then on, you use the code. Code generation isn't a way of life - its an expedient way to get from there to here.

OTOH, stuff like IDL from CORBA days and its use to generate tons of stubbed classes was just insane. Java RMI is similar in this regard. This is the point at which you should be asking yourself "can I get out of doing this routine generation and regeneration if I switch languages" and, of course, the answer is "yes". You really need a language with decent meta facilities to efficiently do distributed objects.

Agreed

It was more of a "in-language DSLs/OM builders are bad in Java" post, not that OM builders are bad in general. If I was using Lisp or Ruby, say, I would certainly take the "code is specification" approach.

Sometimes you even don't need metaprogramming for it if your type system is powerful enough, eg. Haskell.

(Sufficiently capable type systems are indistinguishable from metaprogramming?)

Code generation

Code generation is useful even if your language is powerful. HaskellDB uses code generation to write Haskell types from table definitions. At work I wrote a tool that parses sql scripts with create an alter table statements and generates Java classes that where 1 to 1 mappings to the tables (i.e. Row Data Gateway). It's really useful, because it helps with boilerplate code. Also from the table definitions you can automatically determine joins required between classes and write code like: table1.join(table2).join(table3).where("PK1 = ? and FK2 = ?").select("Foo", "Bar") instead of having to write with dozens of lines of JDBC code.

While I agree that generating tons of stub classes for CORBA and RMI is quite in the border of insanity, I can't deny that code generation is useful in every place where you can define the target code clearly from the specification. Right now I'm writing a tool to use the specification of messages to a legacy mainframe system to help the developers write code to test it.

Even if a language is sufficiently powerful for a task,let's say parsing in Haskell using Parsec, people may still prefer code generators for performance or simplicity reasons (e.g. usually it's simpler to understand code that you can see that the dynamic semantics of a library and it's many operations).