Cω Preview Available

Cω (pronounced C-Omega) is an extention to C# in two areas: asynchronous concurrency (formerly known as Polyphonic C#) and XML data types (Xen). A preview compiler is now downloadable from MSR:

http://research.microsoft.com/research/downloads/

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What's the commonality?

http://research.microsoft.com/Comega/ is the direct link to the Cω project and surrounding papers.

Though I wonder what Asynchronous Methods and Chords have to do with XML processing?

See Luca Cardelli's presentation

See Luca Cardelli's presentation, linked from the site.

I don't find it tremendously convincing myself; in the first instance because I don't really believe that there exists any such thing as "XML Data". Cardelli's discussion focusses on a labelled tree that's basically a simplified DOM (minus namespaces, entities, attributes etc). I don't have any problem with the idea that it might be convenient to use a labelled tree as a generic data structure, or with the idea that it might be useful to define a query language for pattern matching on labelled trees. But it does seem totally bass-ackwards to say that there is such a thing as "XML Data" that is inherently structured in that way.

For one thing, it's possible to give an XML representation of entirely different data structures, like directed labelled graphs (RDF/XML) or the entire contents of a relational database for that matter. Will a query language designed for labelled trees be useful for trying to perform queries on the XML serialization of the contents of an RDBMS? Beyond some fairly trivial cases, probably not.

I find it cognitively disorienting that Cardelli, who I am certain is both smarter and better informed than I am, should refer to database tables as "square" and therefore somehow unsuited to the management of "triangular" data. It's obvious that you can represent trees of arbitrary depth using relations (along with graphs, lattices and anything else you care to mention), just as it's obvious that you can represent relations as "XML Data".

The only use I can see for an "XML Datatype" is as a way of representing the constraints defined in a DTD or XML Schema. Even then, if you happen to be working with RDF you need another schema language again to capture the real contours of the underlying model.

I just don't understand why anyone who knew better would go along with such an egregious confusion of data and representation. You don't hear people talking about "CSV Data", do you?

I should point out

That the expression "XML Data" nowhere appears in Cardelli's presentation. But he does seem to regard data formatted using an XML syntax as having an intrinsically "triangular" structure.

XML vs PL schema

Cardelli does advertise his research, that's a part of his job after all, found "ideal topic for research" - integration xml schemas with PL data structures. He states that PL data has been traditionaly trees (a half truth actually), but based on tree matching rather than tree automata (like XML schemas does). I agree with that. And you know what? I prefer traditional PL structures.

I am happy not to use XML, except for integration with external systems, data serialization file format (viewed neatly in a browser) and a bit of marketing hype ;-)

I work in R&D, and to specify data schemas, we made a natural schema language that models data in "traditional" tree form. It is fundamentally simpler than XML and its schemas, because
a) there are no unions, there are labelled variants instead
b) arrays are explicit, not inlined like (a?, b?, a)*
This implies that we don't need complicated tree-automata parsing techniques and our schemas naturally maps into PL data structures and collections.

Our schema can be used to model and store data wherever one would use XML. For the time being, we don't have tools similar to XSLT, XPath, XQuery, but they are a mixed blessing. I think that would be even greater topic for research.

I don't agree

I don't agree with objections raised above regarding "XML data" etc.

First, I think the integration between programming languages and outside data sources is a real issue, that most languages don't really try to solve. Recall our recent thread about database abstraction layers. Whatever you think about the issues, the fact that this sort of questions arise should tell us there's a problem to be solve.

Second, "XML" as a data representation/layout scheme is, of course, nothing special. Other representations are possible. So? Someone is trying to design a better way to handle data (if it's really better I am not sure yet), and XML is the standard de jour. It makes sense to concentrare your efforts on XML these days.

Finally, XML is not just data representation (cf. "CSV"). DTD/Schemas make it easier to integrate XML at the type system level, and this make the integration with the rest of the language (a) easier and (b) more productive.

Du jour / de jure

I would comment that I use XML fairly extensively to accomplish fairly routine programming/data exchange tasks; I'm always happy to see more useful tools for working with XML, just because there are already useful tools that I'm already using. If everything were S-expressions, I'd be happy that way too. In fact, once we got past the initial parsing stage - which is what many of those useful tools are there to take care of for me - I doubt I'd notice that much of a difference.

XML is general enough to represent any type of data structure, but so for that matter is CSV. Anything that allows you to construct strings of tokens ameaning "one" and "zero" can be used to represent anything else; obviously this doesn't mean all formats are equivalent for all purposes, but it does suggest that fitness for purpose is relative rather than inherent in any particular format. If we choose to format data in one way rather than another, presumably we do so for one or more reasons.

Based on some common criteria (ease of syntax, breadth of adoption), XML is a tolerably good match for a particular broad range of cases (although it is verbose and, yes, it does matter that it's verbose - it's not a deal-breaker, but it is...disappointing, somehow). I suppose we could agree to call those cases "XML Data", but it's at best a slightly misleading shorthand and at worst a vector for the propagation of some serious confusions.

I don't think XML is an especially good "match" for RDF graphs, say, and it's not at all a good match for relational data (which is neither "flat" nor "square" but multidimensional). Simply put, there is more to data than trees, and while the XML toolset can accomodate this "more", it's often less than helpful when it comes to managing, verifying or automatically processing it. Programming interfaces designed around XML's syntax will ease the flow of certain sorts of information, for which relief much thanks, but what we are being promised is more than that: a generality that one could only believe in if one consented to forget about (or work silently around) all of the cases that don't fit.

I don't agree too

There is such a thing as "XML Data". It's the only format in which everybody finds it easy to express their data. I find it amazing that XML is so often dismissed as syntax or as messy, while there's obviously something interesting to it that deserves to be researched.

I think the power of XML Data is that every element is both a hashmap and a container. Not as a choice, but both at the same time. That's a rather unique power. That's also why so many people say, "oh, but xml is just Y". But everytime there's a different Y.

XPath shouldn't be forgotten too. It makes addressing the data a breeze compared to f.e. SQL. "channel/item[description]/title" means "give me every title for every item that has at least one description for every channel".

XML is a bit messy. For example attributes allow only strings as values, so the hashmap often ends up in the elements and the container option is lost. But I think Microsoft is right to stay as close to XML as possible, it will be the most valuable for their customers.

With Xen Microsoft is focussing on the power of schemas. With Moiell I'm focussing on the power of XPath and steering a bit further from XML, because I think the inherit "for every" in XPath is more interesting than describing complex sequences of elements (which I think is more a bug of XML than a feature).

The power of XPath

Right at the moment I'm working on a web app that permits two views of each resource in the domain: one with a ".xml" suffix, that is actually a short RDF/XML document describing the resource, and one with a ".htm" suffice that is an XHTML page generated from the XML source via XSLT.

This only works at all because the RDF/XML is generated by me and I know which of the myriad possible XML representations of the RDF triples that describe the resource is actually going to be used. So I can write an XPath expression to match, say, "channels/rdf:List/rdf:li/channel/" and have an XSLT template output XHTML for each channel. This would break immediately if instead of

<?xml version="1.0"?>
<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns="http://testuri.org#">
<tv-guide>
   <channels>
      <rdf:List>
         <rdf:li>
            <channel rdf:ID="channel1"/>
         </rdf:li>
         <rdf:li>
            <channel rdf:ID="channel2"/>
         </rdf:li>
      </rdf:List>
   </channels>
</tv-guide>
</rdf:RDF>
we had the following:
<?xml version="1.0"?>
<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns="http://testuri.org#">
<tv-guide>
   <channels>
      <rdf:List>
         <rdf:li rdf:resource="#channel1"/>
         <rdf:li rdf:resource="#channel2"/>
      </rdf:List>
   </channels>
</tv-guide>
<channel rdf:ID="channel1"/>
<channel rdf:ID="channel2"/>
</rdf:RDF>

If you feed it to the W3C RDF validator, this will give the same triples (albeit in a different order), and the same graph, but the XPath expression to get the same results would look totally different. If I had to accept arbitrary RDF/XML and transform it with XPath/XSLT, I'd have to deal with a fairly considerable number of possible variations. Probably the only way to do it would be to go through a normalisation stage first.

I guess my question is, are the above examples "XML Data"?

Incidentally

This may also be the reason why there I've never seen a really satisfactory XSLT stylesheet to convert XML schemata into a nice readable XHTML format. There are many valid XML schemata whose actual structure (in terms of the relationships between type declarations) is not a tree, because common type definitions are shared by multiple complex types.

Are XML schemata XML data?

I think I understand what you

I think I understand what you are trying to say, and you point is well taken. But it seems to me you are reading too much into the phrase "XML data".

You seem to want a unique representation, while XML doesn't really attempt to do any of that. The relational approach has the notion of "normalization," but even that doesn't imply a unique representation given that many (most?) data design are not fully normalized.

I can see why this may cause problems, but I am not sure why this issue is relevant to the question of whether better integration between PLs and XML processing is possible.

Better integration is possible

And Xen is one (very good looking, props, kudos etc.) way of doing it. I don't have a big problem with that. What bothers me is the rationale for it, which seems to be to promote the XML toolset (XPath, XML Schema, possibly XQuery) to a data management and querying framework for something called "XML data". I feel this is misguided, although given the credentials of the people involved (I've read a few of Erik's papers, with great interest and gratitude) I feel decidedly uneasy about saying so.

The lack of a unique representation (in XML) for RDF graphs isn't a problem; the problem is that XPath only knows how to talk about trees, and is actually of little help when what your XML actually represents is a graph. Fine, you may say, use one of those nice RDF query languages instead - the right tool for the job (except that unlike XPath you can't currently use it with XSLT). But you then have to acknowledge that XPath is not the right tool, except in those cases where what your XML actually represents is a labelled tree - in other words, where the syntax of the data representation (nested tags and literal content) actually coincides with the structure of the data being represented.

The confusion I'm worried about is the confusion of data with representation, based on the fact that in some admittedly common cases they do happen to coincide in this way. Sjoerd talked about XPath being easier to use than SQL - well, yes it is, but at the cost of considerable generality (not that SQL isn't also specialised - and arguably deficient - in some ways). It's a trade-off, as always, and one needs to be clear about what's being traded for what. Even that isn't what bothers me. What bothers me is why.

The real brain-itch, for me, is still around this idea that you might allow a data format to determine how you think about data processing. Beyond the very earliest stages of the process - parse the inputs, get a tractable abstract representation of their contents (a typed value) - it's surely irrelevant whether it's XML or not. You can construct entire pipelines of components firing SAX events to be handled by other components to process your data without a single tag ever being emitted in anger.

I agree

Beyond the very earliest stages of the process... it's surely irrelevant whether it's XML or not.

I agree that as internal format (i.e., data structure) XML isn't likely to be a good choice for most applications.

Details matter

The real brain-itch, for me, is still around this idea that you might allow a data format to determine how you think about data processing.

But some formats are better than others for reasoning. Consider the use of roman numerals vis-a-vis our current place-value system for simple integers. Consider Descarte's coordinate system for analytic geometry. What about the merits of using Hamilton's quaternion notation or Grassman's product notation for various kinds of physical phenomenon, or the use of complex numbers to represent sinusoidal waveforms.

To me the more fundamental question is: does XML really help you reason about your data or is it just some kind of roman-numeral system for the 21st century?

Given that XML was originally

Given that XML was originally intended for document formatting (hence such oddities as the ability to insert tags in the middle of literal text), it would be a fortunate coincidence indeed if it happened to be useful for reasoning about data.

I think some formats are better than others for human reasoning, because their symbols are concise and expressive and some basic reasoning steps can be expressed as simple symbolic manipulations: see x + y = z, know without having to think about it that x + y - z = 0. But I'm not sure it makes any difference at all when it's a computer doing the processing.

rdf

well I'm an anti-rdf guy, I don't think rdf is a good match for rdf graphs, in other words if the essential virtue of the language is that it describes a directed graph then you'd think they would have come up with a better syntax to do that.

that said
"So I can write an XPath expression to match, say, "channels/rdf:List/rdf:li/channel/" and have an XSLT template output XHTML for each channel. This would break immediately if instead of"
probably means that you shouldn't have an xpath expression like the one above in your xslt, it should be something like this:
<xsl:template match="rdf:List[rdf:li]">
<ul><xsl:apply-templates/></ul>
</xsl:template>

<xsl:template match="rdf:li[tv:channel][not(@rdf:resource)]">
<li><xsl:apply-templates/></li>
</xsl:template>

<xsl:template match="rdf:li[@rdf:resource][ancestor::tv:channels]">
<xsl:variable name="channelid" select="@rdf:resource"/>
<li><xsl:apply-templates select="//tv:channel[@rdf:ID = $channelid]"/></li>
</xsl:template>

I don't know what you want to do with tv:channel when you get to it of course. but that would get both variations shown, so does that mean it's xml data now?

(note I edited this to make sure rdf:li with rdf:resource attributes were evaluated specifically if they had a tv:channels ancestor.)

There's a choice of syntaxes for RDF

Although I think only RDF/XML is "standard", RDF != RDF/XML.

XPath/XSLT certainly starts to get convoluted when there are multiple possible paths expressing the same content. An interesting question: are there any queries over any RDF graph that it's actually impossible to translate without ambiguity into an XPath expression over some (or all) RDF/XML representation(s) of that graph?

rdf != rdf/xml

Well as yoy say rdf/xml is 'standard', I can think of RDF products that support it and that do not support n-triples. Also I do not like the whole RDF is not xml line given that it is the argument generally used to shut people up when it is pointed out that the RDF/XML syntax sucks, although funny enough when the shutting-up argument is used the people using it never acknowledge that it is specifically RDF/XML that they urge everyone to use in all their applications, to embed in their web pages, and just about the only examples they ever show, but anyway, I'm not gonna rant on this subject.

I don't think there would be any such a graph but then I cannot prove it, it's just a gut feeling.

Well,

Nobody choosing a syntax for RDF in a world in which XML did not yet exist would invent XML in order to have a good syntax for RDF. Actually, there are quite a few XML dialects about which something like this is true.

As soon as you've parsed the RDF/XML into an RDF graph / set of triples, you can forget it was ever XML. My feeling is, it's better to do this and then work with the graph/triples (in whatever form you choose to represent them internally) than it is to try to do whatever you wanted to do with the graph/triples by working with the XML.

A losing bet

Suppose I were to bet you that I could create a representation of a graph containing the same information (plus some extra nodes, irrelevant to the query) that would break the above XSLT. I think I could win that one, assuming you accepted. (If you insist, I'll try to work out the details; basically, I'd exploit the fact that ancestor matches an ancestor at any depth in the tree, and I would create rdf:li elements with tv:channels ancestors that weren't the droids you were looking for).

If you were then to bet me that you could write some XSLT that would pass my new "test case", I imagine you could win that one too (I won't press you to work out the details).

Now suppose I were to bet you that for any XSLT you could write, I could write some XML to break it (still representing the same graph, plus extra irrelevant nodes of my choosing). I think that ultimately I would lose that bet, but not until you had effectively written an XSLT program to parse XML/RDF into a set of triples and evaluate queries over those triples. Maybe you wouldn't have to go quite that far, but that's the direction you'd have to be headed in. By that point, I'd argue, you would essentially have left the "XML model" far behind: you would no longer be thinking about trees and hierarchies, but would be implementing graph traversal algorithms in a functional language that just happened to be XSLT.

graph traversal

well I wouldn't have a problem with it, I've written a lot of graph traversal in xslt already as a matter of fact. Yes that is the direction one heads in.

There are a lot of xml formats out there that within them contains structures that are essentially graphs, a directed graph can be represented by a tree (which after all is what is being done with RDF/XML) just as a tree can be represented by a graph, which I would argue we are seeing a lot of in the rdf world now - rss 1.0 was perhaps an early example of unnecessary graphing.

So yes I would suppose that it is still in the xml model, however at the extreme of that model - were it starts to break down.

RDF v XML

The claims made for RDF are quite interesting in light of this discussion.

Nothing new under the sun...

The analogy between RDF and prolog was mentioned here a couple of times. I guess we can say that this gives us a programming model for RDF-like processing.

Add to that a programming model for hierarchical XML data, and you are in a better place.

It is often better to use language features that help solve specific types of problems (cf., expressivenss).

RDF/XML

It's interesting that this document tries to compare the RDF model with the XML model. My contention is that XML shouldn't really be thought of as having a model, that it should be seen as just a notation (among other possible notations) for whatever your model actually is (and if your model actually is some sort of labelled tree, then XML actually isn't a bad notation for it and DTDs aren't a bad way of describing some of its properties). That's a way of looking at it that is congruent with all of the things XML is used for that don't fit into the "XML data model", of which RDF/XML is a convenient example.

It occurs to me that I may be taking a naive and/or perverse position on the relationship between model and notation, or data and representation (e.g. assuming that there is a clear and obvious distinction to be made, and that failure to make that distinction in the way I'm accustomed to amounts to "confusion"; perhaps it's simply not as clear and obvious as all that). It's not something I've thought about in any depth before.

FWIW

My contention is that XML shouldn't really be thought of as having a model, that it should be seen as just a notation (among other possible notations) for whatever your model actually is

FWIW, my feeling is that XML should be regarded as a protocol, like TCP or two's complement. No one should ever have to look at XML; it should just be one of the formats of things passed between programs. Certainly we shouldn't be using it for reasoning, or for programming language syntax.

Re: FWIW

It appears that Tim Berners-Lee, the author cited above, agrees.

Re: RDF v XML

I found another nice, short comparison. I would appreciate more links of this type.