## XNHTML

Ning is a new free online service for building and using social applications.

The apps are built using PHP and a simple XML-vocabulary based DSL called XNHTML.

As I've been saying for a long time here and elsewhere, it's all about programmability these days, and as the Ning folks realise DSLs are a very good technique for achieving end-user programmability.

Seems to me they could have gone the extra mile, and eliminated the need for PHP altogether, but I guess that would be asking for too much...

## Comment viewing options

### PHP not required

Actually, you don't *have* to have PHP and they are going to add support for other languages.

The XML vocabulary is heavily influenced by java's JSP. Many of the XNHTML tags have the exact same format as a JSP tag.

What's most interesting about this is that the XNHTML processing step occurs *after* the php (or whatever you use, including a static page) processing. Because of that I don't think you can have your own application data passed to the XNHTML processor.

That said, it does have access to a shared store, so your application could update the shared store and then the XNHTML page can access that.

### Clarification

Actually, you don't *have* to have PHP and they are going to add support for other languages.

I should have noted that they mention adding support for Python and others. However, that's not what I meant.

I was thinking about a DSL/end-user programming solution, that would eliminate the need for "real" programming knowledge...

I use TAL a lot on my zope sites.

It looks like:

<table tal:define="numbers ['one','two','three']">

<tr tal:repeat="this_num numbers"><td tal:content="this_num"><td></tr>

</table>

Odd Attributes are usually ignored by existing html/xml editors so you get programmability and syntax highlighting/checking without requiring new code.

### This is JSP taglibs

Slava Pestov nailed that coffin long ago.

I'd recommend staying away from such techniques.

### i suspect you know the follow

[this was a reply to "nailing jsp"...]

i suspect you know the following, but here goes nothing anyway...

the example in your link *is* terrible. but no-one worth their salt would write that anyway.

i decided to develop a web site of my own a week or two back. i wrote the database while killing time at a conference last week. today i decided i better start writing the site itself. i came here to kick back and relax because i just got authentication working. in a single day, using the acegi toolkit. that's not just type in a name and check against a database - it's the whole framework, including persistence across sessions with cookies, acls, etc etc.

the recent changes in java are impressive. acegi uses aop to transparently insert security checks (which are defined in a declarative fashion using - shock - xml). while ejbs are a piece of cake now we have ejb3 (using annotations to make life oh so much simpler). and spring provides a clear framework for structuring the server code.

programming a web service in java these days is a pleasure.

getting back to your strawman example - have you any idea what it would look like if you'd used the (pretty much standard) spring mvc support? all that jdbc crap wouldn't be there for a start.

it's not the java your mom knew.

### Focus

Maybe I just live in the wrong world. (Didn't develop enough Enterprise Applications?)

In my world, the complexities just aren't distributed this way.

In my world, generating XML or HTML from databases is a solved problem. The solution doesn't require OOP, AOP, object-relational mappings or annotations; all I need is a good procedural language (e.g. PHP or Ruby, I don't even want Ruby's OO support) and a good grasp of SQL.

(There's even a thing called the "no-layer architecture": using SQL to generate HTML directly. You may be surprised, but it seems to work pretty well, and is standard practice in some Oracle shops, as far as I have seen... At least it doesn't require a half dozen XML vocabularies and some bytecode manipulation.)

In my world, the hard parts of websites are design and usability. The things to master are HTML (yep, raw HTML), CSS, JavaScript, browser glitches, graphic design, interaction design. The winners here are the likes of Flickr and Gmail.

### XNHTML

You can actually write entire apps on Ning in just XNHTML.

The Simple Bulletin Board example app (http://simplebulletinboard.ning.com/) illustrates this... there are two sets of source files in that app: one set in all PHP, the other in all XNHTML. They do the exact same things and operate against exactly the same data, and you can switch back and forth arbitrarily.

As dtauzell said, XNHTML processing happens *after* PHP execution, unlike a typical templating approach where the templates are interpreted before the scripting language runs. We did it this way so that you can emit XNHTML from your PHP scripts and the Ning system itself does the right thing with those XNHTML tags (i.e. displays content objects) as they leave the system. This makes it quite easy to have simple PHP scripts generate simple XNHTML pages that actually do powerful things.

We'd be happy to get feedback on any of this, it's quite experimental at this stage and we'll be evolving it based on how people use it and what people need.

### PHP

Right now, PHP means "PHP: Hypertext Preprocessor". Originally, if I remember right, it was "Personal Home Page" - an end-user DSL for writing web pages.

I smell a trend here.

If an end user can't grok PHP's "if" and "while", he/she probably won't grok "c:if", "c:foreach" "xn_query" and "xn_filter".

### End-User DSLs

If an end user can't grok PHP's "if" and "while", he/she probably won't grok "c:if", "c:foreach" "xn_query" and "xn_filter".

If you build your DSL correctly than it either provides domain specific abstractions, which in this case might be widgets than can serve as building blocks for NING applications, or provides a way to build them and export them to other programmers who use the DSL (i.e., a restricted module systems).

It might be possible to support mix and match visual programming based on selecting and connecting widgets. The graphic app designer than produces source code (in XNHTML etc.) than can be resued, or modified manually be more expert users.

### GUI development

"Selecting and connecting widgets" is the age-old dream of GUI development. It didn't work out in VB and Smalltalk - end users can't write good GUI applications in either.

What reason do we have to think the dream will succeed on the Web?

When you talk about "domain specific abstractions", you're implying that the domain of GUIs is well-understood by the end user. It isn't. Good GUIs are hard and scarce, and the difficulty isn't in the coding. Designing a good GUI is orders of magnitude harder than implementing it (in VB, PHP, Rails or the XML vocabulary du jour). DSLs for GUI building are simplifying the wrong part.

### Sorry

My tone above is way too harsh. I apologize, and will try to explain it a bit more detail...

We may remember the days when people would code in C because they wanted control. Now, we don't want control over our processors, because they've become so much more powerful.

But the GUI designer will always want maximum control over the GUI. The painter will always want maximum control over the canvas. This isn't going to change.

We give the designers a non-Turing-complete language - they ask for more features (e.g. implementing popup menus with ever-more-complex CSS selectors). We give them a Turing-complete language - they start abusing it (e.g. XSLT, which started life as an easy-to-learn DSL for limited XML manipulations).

The specific idea we are discussing, taglibs, is yet another attempt to replace expressive power with Lego bricks. Very enticing at the start, before the feature creep begins.

If your Lego bricks aren't Turing complete, people will demand more features - until Turing completeness creeps in from the back door, by some weird interaction of features. Once your Lego is Turing complete, it becomes a (poorly designed and implemented) general purpose programming language, and people will start using it as such.

### DSLs

I wasn't talking about GUI widgets (perhaps I shouldn't have used the word "widget"). I meant domain specific building blocks. DSLs are a well known approach, and some are - contrary to waht you might suppose - successful.

Since NING is about a speficic class of applications, I can imagine high level components that can serve many similar applications.

Contrary to the view of some other LtU members, I don't think DSLs must not be Turing-complete. They can be, provided they also provide domain specific abstractions.

Notice that it is quite possible to provide a layered solution (or layered language), so that you can build simple apps using components, and yet retain the ability to use a more powerfull (and Turing-complete) language when you need to. In fact, it seems that that's exactly what PHP and XNHTML try be for Ning.

For example of DSL success stories,I suggest searching the archives.

### They can be, provided they al

They can be, provided they also provide domain specific abstractions.

In my view that is exactly the point of DSLs. They make it easy to do what they were intended to do. You could do other things with them (if they are Turing complete), but that isn't the point. The point is that they make coding in the specific domain they cover easy.

### Exactly

Yes, that's exactly my point.

The terms "DSL" and "little language" often refer to the same thing, and for some a "little language" is one that isn't Turing-complete.

### Not exactly the point

I'm afraid I didn't make myself quite clear, and will reiterate a bit...

My post wasn't an attack on DSLs. Yes, DSLs can be successful. My point was: an end-user DSL for component-based GUI development can't be successful.

OK, if I can't make a convincing argument for the general case, let's switch to the concrete. Take a look at this example from the Ning docs (at the bottom of the page). Can we count the languages used?

• XML (with Ning tags)
• XN's expression language
• HTML
• JavaScript

Especially note all the fun quoting and concatenation going on.

When we're done with this, let's look at this brave attempt at recreating SQL in XML (same source; again, look at the bottom of the page). I especially like the "operator="="" stuff.

Is this a simple language with good domain-specific building blocks?

Yes, one can imagine a good end-user DSL for GUI development. However, as we start to zoom in, the issues become overwhelming. There's essentially no working approach to the problem. Slapping on a bunch of XML tags doesn't cut it.

The only way out for a poor Web designer (like me when I started out) is to get down and actually learn PHP. It's simpler than XNHTML. Really.

### Xul

Actually, as far as DSLs go, I tend to appreciate the approach of Xul (and the similar approach of Xaml). Although it's far from perfect, the idea is

• describe your GUI in a simple Xml language
• add-in some simple interaction with databases with a few specialized tags (the so-called templates)
• script using a well-known (former ?) DSL: JavaScript.

Despite the numerous shortcomings of Xul, I believe that it's a step in the right direction.

Still, I agree with the fact that we ought to simplify GUI design rather than just GUI implementation.

### Modifiable DSLs

I think we may be confusing a couple of different conceptions of DSLs here.

One is something like early PHP or, apparently, this XNHTML thing. (I can't tell for sure because I can't access any content at that site due to some nasty cookie issues that are not worth taking the time to work out.) In these, what someone does is code up, in some separate "real" programming language, which I'll call the host language, a simple DSL. Usually this is quite different from and significantly simpler than the host language, and almost invariably the DSL can be extended or changed in any significant way only through the host langauge, not the DSL itself. Thus, you end up programming in either in the DSL or in the host language, and the skills required for each are usually quite different.

Another way of approaching it is to take a general purpose programming language in which you can express a DSL (sadly, not infrequently through some clever but somewhat nasty trickery), and end up with a DSL within the host language. This lets you at the full power of the host language when necessary, and allows easy evolution of the DSL as you find things that are not working as well as you'd like. The major disadvantage of this approach is that you tend to get more fragile syntax and error messages that are more complex and confusing.

So when you talk about DSLs being good or bad, which one of these kinds of DSLs are you talking about? I personally believe that the flaws of the second approach are far easier to fix than the flaws of the first, and the first is never going to give you more than very limited problem-solving power.

### Examples ?

Another way of approaching it is to take a general purpose programming language in which you can express a DSL (sadly, not infrequently through some clever but somewhat nasty trickery), and end up with a DSL within the host language.

Sounds like an interesting approach, but I'm not sure I understand it. Would you have any specific example ?

### Ruby's Rakefile language

Is a pretty decent example. Glen Vanderburg mentioned a few other Ruby examples, at OSCON.

### I think I got it

I actually remember advocating something similar a few years ago, for game development: instead of inventing (yet) another scripting language for writing and customizing in-game objects and levels, using either Java (as it is a well-known and easy-to-learn language) or an extendable language (I was thinking either Scheme or OCaml), possibly with a custom parser to allow a more demagogic syntax (say, Java).

They eventually reinvented yet another scripting language.

### Portability

The first approach has the (potential) advantage of portability between host languages. Another word for it would be "separation of interface from implementation".

For example, if S-expressions were used instead of XML, I think every Lisp would have an incentive to create its own slightly-incompatible format. OTOH, all XML parsers parse the same language.

### I think that's a bit of a chimera

It's true that XML is lexically more rigidly defined; S-expressions may have slightly different lexical rules from each other. On the other hand, XML's rules can be complex, particularly the ones dealing with whitespace and attribute value normalization. Indeed, it is not possible to fully parse XML without access to the schema, and it would be relatively trivial to construct a meta-S-expression language which was prefixed with some fairly simple lexical rules which would be sufficient to accomodate all the Lisp/Scheme S-expression parsers which are out there. (I would think that a single regular expression would suffice to capture all the variations in parsing of atoms, but maybe I'm being over-optimistic.)

That does nothing to capture the semantics of a given language expressed in XML, or in S-expressions, nor to decide which programs are actually valid in a given application context. So while it is correct to say:

all XML parsers parse the same language

it is not correct to say that the language accepted by all XML parsers is the language of any given XML application; it is most likely a superset.

Any XML parser can parse XML without a schema.

Maybe you were thinking about validation - it's a separate, optional step.

### Well, there's parsing and parsing...

Actually, that's a debate I've seen a number of times regarding XML.

There's low-level parsing, which turns a stream into an abstract syntax tree (i.e. the XML document with it's DOM interface). But then, if you want to get the data in the data structure designed to contain it (as specified, for instance, by the XML Schema), you may still need one step of transformation (DOM => data structure).

While this is not stricto sensu parsing, the task of putting the data represented by the stream into the data structure you intend to use it with is usually something you do during parsing. Most parser generators (Yacc, Bison, etc.) support this feature.

That's one thing I've always been annoyed with in XML. I guess I should try XDuce/CDuce.

### Yes

To rephrase, we may talk about XML as a data exchange format or an object serialization format.

For data exchange, XML is mostly adequate, even if used "ad hoc" (without a schema). Although if an XML format is to be widely used, I'd want a schema to check conformance against. One may say this is the intended purpose of XML and schemas...

For data structure (object) serialization, XML by itself is inadequate. People seem to want more specifications layered on top, e.g. XML-RPC, SOAP or RDF/XML. (How exactly do we serialize a tuple, or a dictionary? Which of a number of ways do we choose? How about a circular list? An object with behaviour?)

### Fair enough

I like this way of phrasing the distinction.

### C2

This discussion on C2 seems to be relevant.

A REST guy:

Can anyone explain, without using the words "Supports", "Allows" or "Flexible", just exactly where SOAP makes a positive contribution? Is this really that hard?

A SOAP guy:

Saying XML is saying something, but it isn't saying enough. A standard for using XML for messaging is required. A standard is SOAP. The standard is what is valuable. A hero without anyone to save can't be a hero.

Obviously, the SOAP guy is looking from the object serialization angle (more concretely, serialization of arguments for an RPC call), and the REST guy from the data exchange angle.

### It's not just about validation

Here's a really simple example: do <foo><bar>a</bar></foo> and <foo><bar>a</bar> </foo> represent the same data or not? You can only know whether or not the space between the two closing tags is semantically significant if you know the schema for <foo>. Without that information, you have to assume that the space is significant, with the result that an XML parser with the schema and an XML parser without the schema may produce different parses (or, if you prefer, different data descriptions). There are similar issues with parsing attribute values.

Any XML parser can lex XML without a schema, but that's a somewhat different problem.

### Terminology

I think you're a bit wrong. I'll try to explain...

To the best of my knowledge, no XML parser uses the term "lexing" to mean "parsing without a schema".

The result of lexing is a stream of tokens. The result of parsing is a tree of syntax objects (e.g. DOM) or a stream of syntax events (e.g. SAX). Neither DOM nor SAX is about semantics; parsing is a purely syntactic exercise.

XML Schema, however, is a step towards semantics. An XML parser with a schema doesn't just parse. It also carries out some semantic transformations on the resulting syntax tree (removes that whitespace text node). Note that, to carry out those transformations, the XML has to be parsed first :-)

For an analogous problem, let's look at C:

if (i) j++;

vs

if (i) {j++; }

I can expect the parser to create a "simple statement" object for the if's body in the first case, and a "compound statement" object in the second case. Determining that the two cases are semantically equivalent ("normalize to the same form") isn't any business of the parser. The parser just knows the grammar for "simple statement" and "compound statement".

For another example, let's say we have <foo><bar /><baz /></foo> and <foo><baz /><bar /></foo>. Should the parser understand if the order is significant?

To conclude, parsing is only about text and grammars. Parsing isn't a synonym for object deserialization, or normalization, or schema validation, or whatever. All those steps occur after parsing is done.

Sorry if I'm dense... I'm just trying to get the terminology right. This is a PL forum, after all :-)

### A bit of parsing and a little rant

To the best of my knowledge, no XML parser uses the term "lexing" to mean "parsing without a schema".

Very true. That was my terminology, and I'm prepared to stick to it.

The result of parsing is a tree of syntax objects

That's certainly one possible parsing result, but its uncommon. A much more common parsing result is an "Abstract Syntax Tree" (AST). With respect the to C example, I would certainly not be suprised if the redundant { } did not show up in the AST produced by a given C parser; I might even expect redundant ( ) to not show up. Certainly in the parsers I have written, the ASTs which are returned do not include such information. I would definitely expect a parser to remove all vestiges of comments and non-semantic whitespace (except for parsers specifically written for things like pretty-printing).

I note in passing that whitespace is syntactically significant in most programming languages, but it still doesn't show up in parse trees. For example, most modern languages distinguish between DO I = 5 and DOI=5 even though that hasn't always been the case.

The reason that parse trees are rarely produced by parsers is that they are not particularly useful, and the AST can almost always be built without the intermediate step. Indeed, it is not even always necessary to build an AST as such. Attribute grammars, for example, provide a formal specification of semantic rules; they can be used to directly produce intermediate language (or even final byte code in the case of simple DSLs.) So I don't think it's possible to just draw a line in the sand and say "this side is parsing, that side is whatever".

But more to the original point:

Parsing isn't a synonym for object deserialization

but

For data exchange, XML is mostly adequate, even if used "ad hoc" (without a schema).

Well, I would say that S-expressions are mostly adequate for data exchange, too, despite the slight variance in lexical rules. And, unlike XML, S-expression parsers are not required to pollute their output with insignificant whitespace or comments. :)

An XML parser must return all whitespace within the document (outermost) element to the application.

I am acutely aware of that. The ugliness of the code I write to deal with XML streams is largely a product of having to discard insignificant data. The whitespace conventions make perfect sense if XML is used as a mark-up language for primarily textual documents, but just get in the way when it's used as the swiss army knife of data interchange tools.

Although I've more or less come to terms with the XML infestation, I cannot bring myself to see its charm as a way of writing programs. It has almost all of the wrong features: it is extremely bulky; it has far too much redundancy, increasing the probability of typing errors; and reading it requires taking a machete to the trees in order to get a glimpse of the forest. That's just one cynic's opinion, of course.

### Yes

I'm inclined to agree with most of what you say, but still have a minor nit to pick :-)

Well, I would say that S-expressions are mostly adequate for data exchange, too

They aren't, because they don't deal with encodings.

In my experience, using XML for ad hoc data exchange certainly requires a bit of (boilerplate) code. But once it starts working, it'll continue to work. Encodings, case sensitivity, attribute order sensitivity and other "minor unexpected details" won't ever bite you.

As a Russian speaker, I appreciate the "encodings" part of XML a lot - because Russian has several different encodings, and I've had my share of programming hell with them.

### Confusion of levels

An XML parser must return all whitespace within the document (outermost) element to the application. If it has information from the schema, it may mark such whitespace as inessential, but it must not leave it out altogether. It is up to the application whether to respect that mark or not.

Whitespace crunching in attribute values is another matter, and and an acknowledged flaw in XML that's too late to fix. A parser that does not read the (external) schema may fail to crunch whitespace that it should. Document authors can prevent this by including a minimal schema within the document.