LtU Forum

XQuery language design issues

I would like to call the attention of LtU readers to some ongoing language design efforts in the XQuery standards community. Regardless of my ridiculous opinions about current proposals (some of which I've shared below) I think we can all see that this community-based language design effort is likely to have huge impact: many important systems will be built using the language design that emerges. So, the zero-order point of this post is just to offer some hopefully orienting links, and then I'll opine a little bit, hoping to add something new:

It all starts with the XQuery 1.0 standard. If you are utterly new to it, like I was not long ago, there is a product manual that rewards a quick skim with the gist of what XQuery is about. It is an Introduction to Berkeley DB XML (an open source product).

XQuery is a pure, functional language taking XML types (hereafter "XDM" for XML Data model) as its value typing system. It (mostly) lacks higher-order functions. It is an expression language with expressions taking and returning named tuples or anonymous lists of values. Tail call optimization is provided. Closures and continuations are not. That is where recent developments start.

Three proposals have come to my attention. All are very nice in their ways (though, I opine below, each is a mistake). These proposals are:

1. XQuery Update Facility which introduces a monadic style of I/O to XQuery, mainly by adding new value types which functions may return (the type of "pending update" values).

2. XQueryp which introduces sequencing operators (operators to control execution order in this formerly pure language).

2. (D. Chamberlain, K. Beyer, L. Colby, F. Ozcan, H. Pirahesh, and Y. Xu) Extending XQuery for Analytics. Proceedings of the 2005 ACM SIGMOD Conference, Baltimore, June 2005. which adds an interesting ad hoc kind of list comprehension to XQuery, inspired by SQL's "GROUP BY" and related constructs.

So, free of my opinions: it seems that there's a fairly serious effort afoot to turn XQuery into a general purpose systems programming language for web applications. I sense from the literature that the crossover of db-expert and language design experts hasn't been all that strong, in this area. I hope that among the audience of LtU, some language design experts might help change that.

So, my opinions, which are naturally unjustifiably critical and self-promoting:

I think that the proposed sequencing constructs and update facility are superfluous. Without any changes at all to XQuery 1.0 I was able to add monadic I/O along with a generalization of closures and continuations in XQVM. My solution is a non-intuitive way to achieve the same aims but I'm finding that it works out very nicely.

The addition of a "group by" construct to XQuery seems a mistake to me. It is just syntax -- it can be trivially translated into strict XQuery 1.0 in a way that implementations should have little difficulty optimizing. The literature around the issue has just been overlooking that transformation. (They are just about all assuming that the transformation should use the "fn:distinct-values" operator and the optimizer will have to cope with that. If they tried solving the same problems without using that operator, they'd probably discover the transform I have in mind.)

My opinions aside: what should really happen to XQuery? What are other takes on these new developments?

On the Importance of Purity

I've learned a lot from LTU over the past few years, and I'm very grateful for all the discussions and the patience. LTU has inspired me to create my own language, of which I only have an interpreter so far. To help organize my thoughts, and to spread the word about advanced programming languages to a wider audience, I've written a post explaining some poorly understood features (outside of academia or LTU at least) which I think will become increasingly important in the coming years.

On the Importance of Purity:

The benefits of advanced programmings languages are sometimes difficult to grasp for every day programmers. It's sometimes hard to understand the features of such languages and how they relate to industrial software development, especially since the arguments are couched in terms such as "referential transparency", "totality", "side-effect-free", "monads", "non-determinism", "strong static typing", "algebraic data types", "higher-order functions", "laziness/call-by-need", and so on.

Many of these features are attributed to "pure" languages, but purity can be hard to understand in the context of everyday programming. I will explain the importance of a number of these features and how they impact the everyday programmer's life.

The explanations aren't rigourous, and perhaps even the distinctions I draw overlap somewhat in the real world, but I'm hoping it's an accessible intro to newbies; I'm still one myself after all! I won't post anymore on the subject here unless others think it's sufficiently on-topic, but I welcome any corrections or suggestions to my post! Post there if it's off-topic for LTU. This concludes my self-promotional message. :-)

Different results in the unit root test. Why?

Situation:
I had tired a 1000-data generated by random error(i.i.d.), then I sub it into different unit root tests. I got different results among the tests. The following are the test statistics I got:

For R project:
adf.test @ tseries ~ -10.2214 (lag = 9)
ur.df @ urca ~ -21.8978
ur.sp @ urca ~ -27.68
pp.test @ tseries ~ -972.3343 (truncation lag =7)
ur.pp @ urca ~ -973.2409
ur.kpss @ urca ~ 0.1867
kpss.test @ tseries ~ 0.1867 (truncation lag =7)

For MATLAB:
(adf test) ~ -0.43979

Questions:
1. Tests under same test name, say Phillips-perron test (pp.test & ur.pp), they have different test statistics. Why?
2. Don't the Phillips-perron test based on the Dickey-Fuller distribution table? How the value being so negative (-9xx)?
3. What is truncation lag? Is it the same with lag terms?

The simplest mechanism with Turing-equivalent power to date ...

From Stephen Wolfram's announcement:

But today I am thrilled to be able to announce that after only five months the prize is won--and we have answer: the Turing machine is in fact universal!

...

We plan an official prize ceremony in a few weeks--fittingly enough, at Bletchley Park, where Alan Turing did his wartime work.

But for now I'm just thrilled to see such a nice piece of science come out.

It's a very satisfying way to spend $25,000.

The universal Turing machine in question is a cellular automaton with two states and three colors.

Anybody at OOPSLA?

I just arrived in Montreal. I've been too busy the past few weeks to post anything sooner, but I'd love to meet up with other LtUers in the area this week... (I assume any actual human can figure out easily enough how to get in touch with me.)

In Honor of Per Brinch Hansen (1938-2007)

Prof. Per Brinch Hansen passed away in July.

Let's raise a toast to Prof. Hansen's life, his manifold contributions to computer science and the many delightful essays on concurrency .

Microsoft and PLT

A couple of Microsoft specific pointers that might be of interested to the LtU crowd:

Don Syme's langauge F# is being productized. F# is an OCaml derivative for the .NET platform.

Channel9 has a number of interesting video interviews on topics related to programming languages. This one, with Erik Meijer and Dave Thomas is particularly interesting.

[ANN] Call for Speakers - Code Generation 2008

With an emphasis on providing practical advice, Code Generation 2008 is the ideal opportunity for architects, developers and others to understand how to benefit from emerging tools and technologies in the broad area of Code Generation.

Call for Speakers:
Submission Deadline: Friday January 18th 2008

We are currently seeking high-quality session proposals covering topics in model-driven software development (including Software Factories, Model-Driven Architecture (MDA), Domain-Specific Languages (DSLs), Generative Programming, Software Product Lines and related areas).

Sessions could cover topics such as:
- Tool and technology adoption
- Code Generation and Model Transformation tools and approaches
- Defining and implementing modelling languages
- Domain Analysis and Domain Engineering
- Language evolution and modularization
- Meta Modelling
- Runtime virtual machines versus direct code generation

Real-world case studies based on any aspect of these and related approaches are particularly encouraged although more theoretical sessions are also welcome.

Take part in Code Generation 2008 and find out why industry observers think that Code
Generation is the next evolutionary step in Software Development.

Accepted speakers have their conference fees waived.

Code Generation 2008 takes place in Cambridge, UK from 25th - 27th June 2008

For more information on proposing a session please visit:
http://www.codegeneration.net/conference/speak.php

What people said about our previous conference - Code Generation 2007:

"I've been working in domain-specific modelling for a dozen years … and in this time this has been the highest-quality conference on this topic that I've been to - and I've been to a few."

"The combined—for that matter, individual—expertise present was remarkable, and presented a tremendous opportunity for knowledge exchange."

"The presentations were all top quality, making it often difficult to decide between the concurrently running sessions. The wealth of MDD knowledge present at the event was impressive, not only from the presenters, but from the other delegates as well."

ErlOCaml

ErlOCaml, a "tight bridge between Erlang and OCaml".

Found this mentioned in the "Erlang and OCaml" talk in the proceedings of the ACM SIGPLAN Erlang Workshop 2007.

As the talk-slides say, there is "No particular progress yet", but it is an interesting idea.
Here is the first slide:

Why???

  • Erlang is very good at concurrency
  • Not so good at raw sequential processing
  • Strong momentum in systems programming
  • OCaml has a reputation for speed
  • No particular momentum in concurrency programming
  • We should try to lower the barriers between different FP languages

Any multi-threaded interpreters?

Are there any thread-safe language interpreters out there that run multi-threaded code? Python uses a global interpreter lock to run threads one at a time, though it does some meager preemption. I think I read somewhere that Erlang is actually single-threaded, though it manages zillions of Erlang "processes". Is that still true?

XML feed