Revealing the X/O impedance mismatch

Ralf Lämmel and Erik Meijer. Revealing the X/O impedance mismatch.

When engaging in X/O mapping, i.e., when viewing XML data as objects or vice versa, one faces various conceptual and technical challenges -- they may be collectively referred to as the `X/O impedance mismatch'. There are these groups of challenges. (i) The XML data model and the object data model differ considerably. (ii) The native XML programming model differs from the normal OO programming model. (iii) The typical type systems for XML and objects differ considerably, too. (iv) Some of the differences between data models and between type systems are more like idiosyncrasies on the XML site that put additional burden on any X/O mapping effort. (v) In some cases, one could ask for slightly more, not necessarily XML-biased language expressiveness that would improve X/O mapping results or simplify efforts.

The present article systematically investigates the mismatch in depth. It identifies and categorizes the various challenges. Known mitigation techniques are documented and assessed; several original techniques are proposed. The article is somewhat biased in that it focuses on XML Schema as the XML type system of choice and on C# as the OO programming language of choice. In fact, XSD and C# are covered quite deeply. Hence, the present article qualifies as a language tutorial of `the other kind'.

This paper is over 100 pages, way longer than I have the time to read at the moment. Skimming, the paper looks interesting and useful. If you manage to read the whole thing, do share your observations with us in the discussion group.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

XSD, oh boy..

A concrete reason XSD to OO mapping is important, is that Web services (largely written in Java and C#) rely on XSD as the common type system between them. XSD is hard enough, but because of its complexity, creators of Web services tend to rely on this automatic mapping to avoid writing XSD. That is, they write the service interface in their native language, and let the tools automatically generate the corresponding XSD and keep their fingers crossed that their system will interoperate with other systems.

I work at a software startup in the Web services space. We provide a tool which, among other things, dynamically generates a web interface for invoking messages against the service, based on WSDL + XSD (XML Schema).

Thank god for my undergraduate education in PL and Type theory, because XSD is the most non-orthogonally expressive type system you can imagine. To say the problem of matching XSD to OO is challenging is misleading; XSD is challenging, near intractable. Some groups have tried to 'subset' XSD into a far simpler type system (which is met w/ much resistance). Others abandon it for RelaxNG for example. The majority of us are stuck with.

An interesting question to me is, with so much thought put into type systems, how does committee-ware like XSD come into being? A simple review of it should have highlighted the overlap problems, as well as the difficulty in implementing a tool which respects the specification to the fullest.

What can we as a community do? Can we set up a more formal critique method, and yet not be the committee itself? I'm very interested in this problem. As the industry re-invents the wheel every couple of years, it would be nice to provide an objective 'checkpoint'.

Sung to the Tune Of?

Is the title here cut off, and really should be "I Read the XSD Today, Oh Boy," sung to the tune of "A Day in the Life?" :-)

Question

I have a question that I hope more knowledgable persons can answer. Before asking the question, I have to say I had a very brief look to the paper, so I apologise beforehand if my question is silly.

Why do they think there is a mismatch between XML and OO? I have been using OO to map XML for years; there is a perfect match. Each XML node is represented by an object in a tree.

OO tends toward graphs not trees.

For simple datasets not much of an issue, especially if you can start at the XML end and code afterwards.

For more read section 2 of the paper.

I'd disagree that OO tends

I'd disagree that OO tends towards graphs; it's certainly capable of it, but in practice most code I've worked with that goes to/from XSD/OO this isn't that much of an issue. Business data tends to be more heirarchical, etc. It's certainly an excellent example however of the failing from OO to XSD.

You're right, tendng towards

You're right, tendng towards was the wrong phrase to use. Most code is hierarchical in my experience as well, but when it isn't there's normally a good reason and you're forced to jump through hoops.

OO != Hierarchical != Relational

It's the same old OO database motivation conversation.

Keep in mind the mismatch of

Keep in mind the mismatch of note is XSD OO type system. Representing a tree of nodes in OO isn't hard.

I think these points are mentioned in the paper. First, consider going Object-graph to XML. No standard translation defined (how do you deal w/ cycles, you invent some id/ref mechanism. SOAP Encoding did this and more, and has been dropped like a rock).

Take an optional element that is nillable. How do you encode that in Java/C#? Most toolkits treat optional element w/ simple string content as string, therefore its absence is treated as null. Now let that element also be nillable... not so obvious how to encode all the states. Choices aren't easy to represent OO, you need to invent wrapper classes to tag the injection. Another spot where element != object in terms of mapping. There's a good discussion in the paper about restriction/extension and the tension in trying to reconcile it in OO.

Entire products exist around solving some of the problems the 'impedance' has caused various toolkits, and committees are springing up to help define more standard XSD -> OO mappings

Blaargh

What are you talking about here, really: writing your own set of classes for each new type of document you encounter, one class for each element name? Writing your own SAX Handler to parse them, create the objects and populate the data structures that you'll need to work with them? And more methods to convert the objects back to XML for output?

All that code you're writing--that's you working around the impedance mismatch between XML and objects.

It would be nice to autogenerate all that code, but the two models do not align nicely, hence the 100-page dissertation...

The type systems don't match up.

It's straight forward to represent an XML document up with a tree of OO objects. It's harder to map an XML Schema type to a Java language type.

For one, XML Schema is regular-expression-like and I think that encourages people to define a data representation that ends up matching the concrete syntax instead of the abstract syntax. There are some languages that try to support "regular types", but Java doesn't.

I think that the root cause for the mismatch between XML and {every-programming-language} is caused because things like DTD, XML Schema, and RelaxNG are just bad/broken type systems. The easy way out is to dump the current XML type systems and adopt a sensible one (I'm partial towards something based on algebraic data types).