Lambda the Ultimate

Native XML Scripting
started 9/20/2002; 2:34:04 PM - last post 3/29/2003; 4:25:58 AM

Ehud Lamm - Native XML Scripting

9/20/2002; 2:34:04 PM (reads: 2214, responses: 10)

Native XML Scripting

Adding native XML support to ECMAScript. Plus: Jon Udell's comments

Making XML a native datatype is something we discussed in the past. This seems like something that's actually going to happen.

Posted to xml by Ehud Lamm on 9/20/02; 2:34:57 PM

Martin Bravenboer - Re: Native XML Scripting

9/20/2002; 5:24:41 PM (reads: 1623, responses: 0)

I've searched the web for more information on this so called E4X group, but I (well, Google ;) ) wasn't able to find anything! The only thing I found is this tutorial, which is also from BEA.

This indeed looks interesting and will probably be something normal programmers will use. XDuce for example is another interesting language with XML as (the only) native data type, but I'm afraid a lot of programmers cannot handle this.

Dan Shappir - Re: Native XML Scripting

9/22/2002; 4:41:45 AM (reads: 1584, responses: 2)

Native XML scripting does look cool, but I would like to point out that you can already do much of what has been described in the article using the BeyondXML module of BeyondJS on IE and Mozilla. For example, where Native XML lets you write:

var x = <test><a>hello</a><a>there</a><b>world</b></test>;

BeyondJS lets you write:

var x = "<test><a>hello</a><a>there</a><b>world</b></test>".parseFromString();

which assigns the DOM node to x. Not quit the same but pretty close.

Another example: in Native XML you iterate over node lists using the standard ECMAScript for ( in ) syntax:

var total = 0; for (var item in order.item) total += item.price * item.quantity;

Using BeyondXML the syntax becomes:

var total = 0; "order/item".selectNodes(d).foreach(function(item) { total += "price".value(item) * "quantity".value(item); });

where 'd' is the reference to the DOM node. Again, not quit the same thing but close IMO. It would have been even closer had IE and MSXML provided prototypes for the DOM objects.

I certainly applaud the efforts E4X group, and ECMAScript's intrinsic extensibility makes it a natural candidate for such extensions. However, BeyondJS lets you achieve similar functionality in the present, using standard JavaScript (js files not jsx) and without requiring extensions to the ECMAScript syntax.

Ehud Lamm - Re: Native XML Scripting

9/22/2002; 8:04:14 AM (reads: 1619, responses: 0)

Cool stuff (as we've come to expect from you!)

so now let's rephrase the question... What does it mean to make something (XML, for example) a native datatype?

pixel - Re: Native XML Scripting

9/22/2002; 9:19:58 AM (reads: 1620, responses: 0)

Dan Shappir - Re: Native XML Scripting

9/22/2002; 9:21:31 AM (reads: 1552, responses: 0)

In the context of XML, I think a distinction needs to be made between representing Infoset nodes as native datatypes and intrinsically supporting the "standard" XML serialization syntax embedded in the code.

While the later seems very natural and appealing to most anybody that works with XML, it almost certainly necessitates modifications in the underlying language syntax (and maybe even semantics). This can easily result in all sort of parsing problems (which ECMAScript already has BTW).

I think it's more important to be able to work with the Infoset in the context of the language's native facilities. My main problem with the XML DOM as a representation of is that it behaves differently than standard JS datatypes for no good reason (other then it needs to be compatible with other languages). For example: the lack of prototypes and dynamic properties I mentioned before.

I think it can be quit possible to serialize the Infoset into some sort of representation that better matches standard ECMAScript facilities. For example, attributes might be accessed as object properties. When represented in this way, it becomes much more easy to manipulate the XML from an ECMAScript program.

This perhaps also shows the downside of something like the CLR. When you create a single library for many target languages you almost certainly avoid language specific features. In other words, you go for the (lowest) common denominator.

A problem with representing the Infoset in ECMAScript has to do with the distinction in XML between node attributes, node children and node properties (stuff like name and namespace). This doesn't quit come naturally to ECMAScript although Sjoerd and I have been throwing around some ideas, e.g. using property access operator [] to get attributes and the function call operator () to get at child nodes. This is very preliminary however, and not likely to make it's way into Beyond in any event (we are not out to write a new XML DOM (at least not now :-)

The fact that ECMAScript is functional, in the sense that it supports functions as first class citizens should make it easier to write filtering and manipulation code. Though for some reason E4X have not gone down that road, e.g. their sample:

var over27inEng = y.employees.employee.(department.@id == 500 && age > 27);

instead of something like:

var over27inEng = y.employees(function(employee) { return employee.department.@id == 500 && employee.age > 27});

Patrick Logan - Re: Native XML Scripting

9/22/2002; 5:59:22 PM (reads: 1520, responses: 0)

I think SXML demonstrates that Lisp's lists are essentially a superset of "Native XML".

http://okmij.org/ftp/Scheme/xml.html#SXML-spec

One thing that appears missing from Native XML is a succinct quasiquote syntax.

http://www.math.grin.edu/courses/Scheme/r5rs-html/r5rs_37.html

Native XML is a good idea, but I'd like to see a comparison with Lisp's lists and SXML in order to understand how new and expressive it may be.

Dan Shappir - Re: Native XML Scripting

9/23/2002; 1:53:35 AM (reads: 1528, responses: 1)

Some additional thoughts on this subject:

It's not surprising that they chose ECMAScript for XML scripting. There are several technical reasons that make this a reasonable choice: ECMAScript is highly familiar both because its C/C++/Java-like syntax (that may be a bit misleading at times) and because its arguably the most commonly used script language. Additionally ECMAScript is supported on most every platform (including the JVM) and its runtime is installed on most computers. Finally there are several open source implementations available.

There are also several technological reasons why ECMAScript is a good choice. Firstly, ECMAScript objects are completely dynamic allowing addition and removal of properties at run-time. This is compatible with the dynamic nature of XML documents. Secondly, ECMAScript allows property access both using the common notation object.property and through an expression object["property"]. This is also a useful feature in this context. Thirdly you can easily iterate over object properties using the for ( in ) statement.

Having said all this, there are some problems with this implementation, some having to do with differences between the ECMAScript object model and XML others having to do with specific implementation decisions made by this group.

One major problem, as I've already noted, is that XML nodes differentiate between attributes, child nodes and node properties such as namespace and owner document. ECMAScript OTOH doesn't naturally make such distinctions. This implementation introduces the use of @ as a denotation for XML attributes. This solution is not in sync with the rest of the ECMAScript language. I don't know what their solution is for things like namespaces, it doesn't appear in the article.

Another problem has to do with the fact that in ECMAScript object properties are unordered (accept for array elements). This is incompatible with the XML model. This article doesn't demonstrate how a new node can be inserted at a specific index position.

Finally, ECMAScript allows only a single object property with a specific name. XML allows numerous child-nodes with the same name. This implementation handles this difference by automatically converting such properties to arrays (the browser DOM does something similar). The transition from one property value (when its not an array) to multiple values (when it is) is not clear however.

With regard to this specific implementation, while the ability to write XML code directly into the script looks cool, I think using strings would have been quit sufficient (see the BeyondXML sample above). After all, serialized XML is just a series of characters. It would undoubtedly made the syntax of the resulting language a lot simpler (and as I've pointed out ECMAScript already has some syntactical inconsistencies).

Also, their filtering operator avoids the use of function arguments for some reason. Perhaps they weren't aware of this ECMAScript feature. Finally, they appear to have overloaded the true symbols even more (as if ECMAScript doesn't overload them enough already). Indeed I think their <{annotation}>{value}</{annotation}> is a bad idea if only for having to repeat {annotation} twice. I think something along the Beyond style of "annotation".value(value) makes more sense.

Having said all this, I do like seeing ECMAScript being used in new and innovative ways.

Ehud Lamm - Re: Native XML Scripting

9/23/2002; 2:06:47 PM (reads: 1603, responses: 0)

Thanks for the deatiled analysis. I'll have to read it more carefuly to comment.

It's not surprising that they chose ECMAScript for XML scripting.

Sure. Still, I wonder how XML processing would be embedded in other languages. Since I wrote quite a lot of Rexx code, back when I was an MVS systems programmer, I tend to think about Rexx when discussing glue languages.

I haven't been up to date with what IBM does with Rexx, but I gess they are thinking about this. It would be interesting to see what they come up with.

Rexx stem variables (essentialy multi-level associative arrays) can be used to represent XML trees. A clever implementation can use this to model a lare part of XML processing. The rest is going to be a bit harder.

Dan Shappir - Re: Native XML Scripting

9/24/2002; 12:33:59 PM (reads: 1430, responses: 0)

I too have fond memories of Rexx, my first true scripting language back from my VM/CMS days. At the time Rexx became my favorite PL, not surprising considering that coming out of the University at 88, the other languages I was familiar with were C, Pascal, Basic and a little bit of Scheme (too little to make a difference - I just didn't get it at the time).

I lost contact with Rexx when I moved back to DOS and then Windows. Also I got immersed in OOP, first with Object Pascal and then C++, so my PL itinerary was full for a while.

I do sort of remember Rexx associative arrays, and the word counting program I wrote to test them out (comparing to an AWK sample). I don't remember them enough to compare to ECMAScript objects, which are also implemented as associative arrays. This type of data structure does appear to match XML processing requirements, however as I've pointed out there are some problems such as the distinction XML makes between attributes and child nodes and the fact that XML allows multiple children with the same name.

BTW, I nice feature I like from Native XML scripting is the use of the .. operator to get a list of all subnodes. I think that this feature can be extended to retrieve a list of all sub-properties under any type of ECMAScript object hierarchy. While I can't say when I would use it (perhaps to model XML processing :-) it is cool.

In any event, thanks for the chance to walk down memory lane.

Ehud Lamm - Re: Native XML Scripting

3/29/2003; 4:25:58 AM (reads: 1192, responses: 0)

Internet Wire: New ECMA International Standard, ECMAScript for XML (E4X), to Unlock the Power of XML for Web Developers