Lambda the Ultimate

inactiveTopic What's Wrong with XML APIs
started 5/29/2003; 2:44:16 PM - last post 6/2/2003; 7:07:30 AM
Ehud Lamm - What's Wrong with XML APIs  blueArrow
5/29/2003; 2:44:16 PM (reads: 1390, responses: 7)
What's Wrong with XML APIs
Ostensibly this interview with Elliotte Rusty Harold is about library design, specifically the design of the standard XML APIs. However, as we well know, API design can be very programming language dependent. As a simple example consider the statement [i]n a tree-based API, an XML document is read by a parser, and the parser constructs an object model, typically around a tree with nodes for elements... The entire document is stored in memory. A tree based API in a lazy language wouldn't necessarily have to store the entire document in memory before further processing can be done.

Aside from being a nice source of examples of this sort showing the interaction between language features and library design, this interview is related to one of our recurring topics, namely whether programming languages should offer built in support for XML processing.


Posted to xml by Ehud Lamm on 5/29/03; 2:51:25 PM

Ehud Lamm - Re: What's Wrong with XML APIs  blueArrow
5/29/2003; 3:17:36 PM (reads: 777, responses: 0)
This is somewhat related.

Dan Shappir - Re: What's Wrong with XML APIs  blueArrow
5/30/2003; 12:31:31 PM (reads: 644, responses: 3)
A tree based API in a lazy language wouldn't necessarily have to store the entire document in memory before further processing can be done.

While this is true in principal (and not only for lazy languages - think about the flyweight design pattern), I believe it wouldn't be true in practice. Given how XML is usually serialized (the angle brackets we all know and love ;-), you would generally need to read and at least partially parse the entire document before allowing accessing to its components. Think about an XML document coming in over a socket, and you want the content of the last node. You must read the entire document into memory, and to locate that node, partially parse all the preceding nodes, even in a lazy PL.

Indeed, for this reason, I have often found XML to be a useful format only when the entire data contained in the document can, in one way or another, be completely stored in RAM. The exceptions to this rule are the pull/push type APIs where only a specific section of the data is required (and you still need to parse all the document up to that point) and, more interestingly, the query APIs (the fifth type in the article). That type of API could be used naturally over a serialization format that supports random access to nodes.

With regard to a point made in the article by Elliotte Rusty Harold about pull type APIs being more natural (he calls it simpler) than push: While Microsoft seems to agree, they've built a pull API into .NET, I disagree. I believe a push API is more natural in this context, it's just different than what many programmers are used to. Maybe if more had used functional PLs ...

Ehud Lamm - Re: What's Wrong with XML APIs  blueArrow
5/30/2003; 2:20:49 PM (reads: 652, responses: 0)
I believe it wouldn't be true in practice.

I agree.

Dan Shappir - Re: What's Wrong with XML APIs  blueArrow
5/30/2003; 3:04:35 PM (reads: 649, responses: 1)
I believe a push API is more natural in this context, it's just different than what many programmers are used to.

An interesting question has occurred to me: why do so many programmers see event-driven GUI development as a GOOD THING, yet perceive event-driven XML processing as weird and unnatural?

Toby Reyelts - Re: What's Wrong with XML APIs  blueArrow
5/30/2003; 4:14:55 PM (reads: 593, responses: 0)
Why do so many programmers see event-driven GUI development as a GOOD THING, yet perceive event-driven XML processing as weird and unnatural?

I'm not so sure that it's weird and unnatural - just time consuming. In the large body of work that I've done with XML, the times are few and far between in which it made sense for me to use an API like SAX. If I used SAX all of the time, I'd be spending large amounts of my time manually creating the same context that the DOM API automatically creates for me. I think this is why tools like JAXB are becoming prevalent. People generally need to walk the entire document anyway, and having a tool that automatically generates a language binding to the document just makes that job easier. As an aside, even where it did make sense to use SAX, it's now probably more productive to use XPath. The only drawback is that XPath tends to be very slow.

andrew cooke - Re: What's Wrong with XML APIs  blueArrow
5/30/2003; 7:25:56 PM (reads: 661, responses: 0)
An interesting question has occurred to me: why do so many programmers see event-driven GUI development as a GOOD THING, yet perceive event-driven XML processing as weird and unnatural?

i think it's just that tree manipulation (in memory) is such a basic part of computer science that you become accustomed to it (is wearing clothes "natural"? ;o). when i first learnt about xml processing, dom felt "right" to me. these days i normally use sax, because it makes more sense in our application - in broad terms we use xml for data transport, not data processing, so there's no need for global transformations (if a client does want something weird we have the possibility to pipe it through xsl - with the possibly forlorn hope that someone else has put the effort into implementing the compiler so that it's efficient with streamed data and doesn't simply construct the whole dom tree...). we also run (x)html through sax - again, web pages tend to be top-to-bottom things (cocoon uses sax too, i believe).

irrelevant but "amusing" bug: at least one implementation caches the objects used to carry attribute information and then re-uses them once outside the element scope, which gives the strangest errors if you store and replay sax streams (replaying objects that have changed their contents....) (ok, the whole idea that you'd store and replay sax streams as objects seems a little odd, maybe - why not store xml as text? - but it seemed like a good idea at the time)

in the end, you choose the right tool for the job, i guess.

Dominic Fox - Re: What's Wrong with XML APIs  blueArrow
6/2/2003; 7:07:30 AM (reads: 498, responses: 0)
I imagine you could scan through a file once and build a "skeletal" DOM consisting of file pointers, then go back and read selected regions for more detail as you drilled down through the tree - like having an indexed document. You have to read the whole document initially to build the index, but if you then did a very large number of reads on subsets of a very large document it might pay off over time.

This assumes that even though you don't have the RAM for the whole document, you have the RAM for the DOM skeleton. Alternatively, you could build a skeleton up to a certain depth of the document, and only build the skeleton for its nether regions on request - have a "master index" stating roughly where to look, and build sub-indexes for regions you're interested in as required. It depends partly what the cost of random file access is, I imagine - whether this is worth doing might depend on the efficiency of the file system, as well as the speed of the hardware.