## XQuery language design issues

I would like to call the attention of LtU readers to some ongoing language design efforts in the XQuery standards community. Regardless of my ridiculous opinions about current proposals (some of which I've shared below) I think we can all see that this community-based language design effort is likely to have huge impact: many important systems will be built using the language design that emerges. So, the zero-order point of this post is just to offer some hopefully orienting links, and then I'll opine a little bit, hoping to add something new:

It all starts with the XQuery 1.0 standard. If you are utterly new to it, like I was not long ago, there is a product manual that rewards a quick skim with the gist of what XQuery is about. It is an Introduction to Berkeley DB XML (an open source product).

XQuery is a pure, functional language taking XML types (hereafter "XDM" for XML Data model) as its value typing system. It (mostly) lacks higher-order functions. It is an expression language with expressions taking and returning named tuples or anonymous lists of values. Tail call optimization is provided. Closures and continuations are not. That is where recent developments start.

Three proposals have come to my attention. All are very nice in their ways (though, I opine below, each is a mistake). These proposals are:

1. XQuery Update Facility which introduces a monadic style of I/O to XQuery, mainly by adding new value types which functions may return (the type of "pending update" values).

2. XQueryp which introduces sequencing operators (operators to control execution order in this formerly pure language).

2. (D. Chamberlain, K. Beyer, L. Colby, F. Ozcan, H. Pirahesh, and Y. Xu) Extending XQuery for Analytics. Proceedings of the 2005 ACM SIGMOD Conference, Baltimore, June 2005. which adds an interesting ad hoc kind of list comprehension to XQuery, inspired by SQL's "GROUP BY" and related constructs.

So, free of my opinions: it seems that there's a fairly serious effort afoot to turn XQuery into a general purpose systems programming language for web applications. I sense from the literature that the crossover of db-expert and language design experts hasn't been all that strong, in this area. I hope that among the audience of LtU, some language design experts might help change that.

So, my opinions, which are naturally unjustifiably critical and self-promoting:

I think that the proposed sequencing constructs and update facility are superfluous. Without any changes at all to XQuery 1.0 I was able to add monadic I/O along with a generalization of closures and continuations in XQVM. My solution is a non-intuitive way to achieve the same aims but I'm finding that it works out very nicely.

The addition of a "group by" construct to XQuery seems a mistake to me. It is just syntax -- it can be trivially translated into strict XQuery 1.0 in a way that implementations should have little difficulty optimizing. The literature around the issue has just been overlooking that transformation. (They are just about all assuming that the transformation should use the "fn:distinct-values" operator and the optimizer will have to cope with that. If they tried solving the same problems without using that operator, they'd probably discover the transform I have in mind.)

My opinions aside: what should really happen to XQuery? What are other takes on these new developments?

## Comment viewing options

### postscript

A useful "point of view" is to conceive of XQuery as the assembly language of a very high level graph reduction virtual machine.

Starting from that point of view, a lot of the fancy ideas that people discuss on LtU take on a strange new relevance, given XQuery's "economic" position.

It's a bit like JVM, in this point of view, but for graph reduction rather than imperative object oriented stuff.

-t

### relation to cat

XQuery looks at first glance like it has blocks and variables but, really, those are just variable-free expressions that all take and return either lists of anonymous values or tuples of named values. It is, in some sense, a (slightly statically typed) concatenative language in disguise.

-t

### I'm a little confused

I believe there are interesting design issues around XQuery, and I generally agree that the languages and databases people haven't always worked together very effectively. But why post this here? Why not take these concerns to the XQuery people? For example, you say:

The literature around the issue has just been overlooking that transformation. (They are just about all assuming that the transformation should use the "fn:distinct-values" operator and the optimizer will have to cope with that. If they tried solving the same problems without using that operator, they'd probably discover the transform I have in mind.)
No offense, but if you have a particular better transformation in mind, why not just tell them? And if you have in mind a better way to think of XQuery in general, why not share that with them as well?

It sort of appears that you're hoping that an "authority on langauges" will step in and clear things up, but I don't think it's really likely to happen. I suspect it's up to you to justify your opinions and make them convincing to the standards people.

### re confusion

Why post this here?

Three reasons: First, I think that the XQuery standard has plenty enough socio-political importance (given who's using it how) that the more qualified attention is paid to it the better. Second, based on discussions and observations, it is my impression many people who "ought" to take interest in Xquery (for some definition of "ought") are not doing so. So, I am trying to avoid a prohibitted "advocacy" post which argues that people should use XQuery but am hoping to help signal its importance in the market and the language theory issues that are currently central to its evolution. Third, of course I have my own language design work that I pimped a little bit, because I don't directly know many people to talk with about it.

No offense, but if you have a particular better transformation in mind, why not just tell them?

It's no secret -- I just didn't want to exhibit it in a post ostensively aimed at an audience that included many not already famiiar with XQuery. Also, since it isn't that hard, I didn't mind making a little puzzle of it. Another couple of hints are that you have to use tail recursion and you have to implement some data structures in XML (e.g., an associative table stored as a tree). Another hint is to use a generator pattern in your tail recursive function ... declare function local:grouper ($sequence-to-group,$accumulation-of-groupings-so-far).

It's *that* thing they are missing, if you see what I mean. Now doesn't it seem like they could really benefit from a random assault of PLTheorist attention?

-t

### so (confusion II):

Why here? Partly, also, I'm hopeful to find some other people to work with on this stuff, as a team, of one form or another. But that's always quirky and a long-shot, of course.

-t

### An outsider

I haven't been following XQuery for quite awhile, but as far as I remember some pretty well known people form the PLT world were involved, no?

### I don't know the history but, the built environment...

XQuery 1.0 absolutely exudes PLT brilliance. That's why these later proposals surprise me, but for the fact that I see tricks that make them needless. Finding a couple of tricks is an accident so I got lucky. Meanwhile, XQuery 1.0 is such a gem that I feel a bit protective towards it. Without having seen the tricks, I'd think the proposals I was critiquing were really good.

-t