Lambda the Ultimate

inactiveTopic XLANG and WSFL: Syntactic Arsenic
started 7/18/2002; 6:13:14 AM - last post 7/21/2002; 1:44:09 PM
Dan Shappir - XLANG and WSFL: Syntactic Arsenic  blueArrow
7/18/2002; 6:13:14 AM (reads: 2097, responses: 14)
XLANG and WSFL: Syntactic Arsenic

(via Patrick Logan's Weblog)

It seems that the world has gone XML crazy lately. Because of the hype, its not surprising that anything that can be, has been turned into XML. Still, XML is lousy to read and write. Programming language people have a term for making the syntax of a language pretty: syntactic sugar. XML is syntactic arsenic.

A rather exasperated commentary on where XML ought and ought-not be used.


Posted to critiques by Dan Shappir on 7/18/02; 6:14:45 AM

Adewale Oshineye - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/18/2002; 8:45:36 AM (reads: 1560, responses: 0)
Take a look at the Java Server Pages™ Standard Tag Library for another example of people re-inventing a programming language in tags. JSTL enables people (web develoeprs and designers according to the specification) to generate Java code inside JSP supposedly without knowing Java. I really don't understand why people keep trying to invent programming languages aimed at non-programmers.

Dan Shappir - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/18/2002; 9:19:30 AM (reads: 1558, responses: 0)
I really don't understand why people keep trying to invent programming languages aimed at non-programmers.

Because many (most?) programmers don't know how to program, and almost all non-programmers don't know either.

Most of the Web, including much of the dynamic part, has been developed by non-programmers. Anything that can get this stuff to work despite lack of programming skills might be considered a "good thing".

Frank Atanassow - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/19/2002; 12:38:31 AM (reads: 1488, responses: 0)
You guys just don't grasp the spirit of XML. The truth is that, if anything, XML just doesn't take the mark-up concept far enough!

I've written a treatise on the subject.

jon fernquest - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/19/2002; 1:27:24 AM (reads: 1497, responses: 3)
I rejoice that standard XML ways of parsing things, including programming languages, are starting to be used extensively.

Most computer scientists and programmers who I know who have been around for awhile are of the same opinion as thje author of this paper. XML, so what, it doesn't really do anything new. There is a lot of vacuous hype surrounding XML and it's not really very readable, especially for programming languages, e.g. XSLT. Agreed.

But it is the focus for a lot of work on transformation and analysis tools.

JavaML is a good example. The author of JavaML runs the JavaML through a lot of SGML tools to collect statistics. He also has XSLT style sheets that transform the JavaML back to Java.

The transformation and analysis tools could conceivably allow you to transform your program into any format you want: s-expressions, c-algol-java syntax, a GUI tree component, an annotated parse tree in Graphviz , a computer voice reading me the program in the shower, etc.

I agree that before XML had gained momentum as a standard XML was pretty much worthless, but it sure isn't now.

I don't think it's so much a "language independent way" but just a "standard way." It probably has to be "language independent" because there is so much evangelical language advocacy around that a language dependent way could never become a standard.

Example of real problem made easier:

I want to pretty print my Java programs so that they are easier to read on my 30 column iPAQ editor screen.

iPAQ's run Personal Java (basically Java 1.8) I have to port pretty printers written to run with Java 2.0. Every pretty printer has its own way of doing parse trees.

I have to wade through umpteen different ways of doing parse trees. If there was some standard for parse trees, e.g. using XML/DOM there'd be one less thing to think about and I could spend more time working on the real meat, i.e. semantics, i.e. useful features for users. A parse tree is merely a syntax-sugar tool.

To me XML/DOM/Schemas/XSLT/......etc technologies are a great real world example of the power of OnceAndOnlyOnce eliminating redundancy and paving the way for re-use.

Other dreams:

  1. Writing the mathematical parts of my program in TeX [ PassiveTex , Various Other Converters ].
  2. Using graphically displayed trees for Prolog traces (Bird Boxes) rather than the linear numbering/indentation approach.
  3. Watching Haskell evaluation take place graphically.

XML graph-tree formats would probably facilitate this across different implementations and platforms.

Anton van Straaten - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/19/2002; 12:58:24 PM (reads: 1429, responses: 1)
I heartily agree with Jon. Regardless of the technical issues with XML, it is an extremely useful standard. It's really the first-ever widely accepted standard for representation of structured data. (AFAIK, that statement can only be quibbled with by raising relatively obscure and much more focused standards like XDR, ASN.1, maybe IIOP, etc.) Some people like to trot out s-expressions as an older, superior system; but for whatever reason (perhaps the use of wimpy parentheses instead of aggressive angle brackets), s-exps did not turn into a global, cross-industry, multi-purpose standard for structured data representation.

What Windley is complaining about is actually a consequence of XML which (in theory) should be welcomed on this board: XML has made it much easier, in practice, to develop systems that rely on metadata and domain-specific languages and meta-languages, even if those languages happen to use XML syntax. This provides the potential for software development to move in positive directions, to a higher semantic level - and as a result, in many cases, it actually has.

Applications which use XML in this way would, not very long ago, have needed to be developed using the traditional scanner/lexer/parser approach, with a BNF-style grammar and a non-standard tool needed to generate the parser from the grammar. Although parser generators are freely available, none of them actually rise to the level of being "standard", and their use is not something that every developer is familiar with. With XML, a developer can create a grammar on the fly, by example.

This is perhaps one of the most important aspects of XML: plenty of research shows that humans learn and teach well by example, and the fact that the grammar for an XML document can be inferred from the document itself is a major benefit. My experience indicates that an enormous percentage of XML currently in use does not in fact have a DTD, or if it does, the DTD was automatically generated from a sample document, not hand-written.

XML has lowered the barriers to entry to an important area of software development, and that is a benefit to experts and average developers alike. To "free [Windley] from XML tyranny", something equally capable would have to replace it. I have yet to see anything that does. This is a good measure of whether or not XML provides anything new or unique - it clearly does. If anyone has any counter-examples, of existing technologies that could, in their currently existing state, replace XML in all its aspects, I'd love to hear them.

Most of what I've seen touted as a possibly replacement are along the lines of "if you simply added/enhanced/defined...", which is fine, but no-one has actually done the adding/enhancing/defining which would be needed to actually replace XML. Ah, the peskiness of the real world, wanting things to actually *exist* all the time... The XML cow is not spherical, and that's the point.

Ehud Lamm - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/19/2002; 2:14:47 PM (reads: 1481, responses: 0)
XML has made it much easier, in practice, to develop systems that rely on metadata and domain-specific languages and meta-languages

I raised this issue several time here on LtU, claiming that designing good XML vocabularies (e.g., SVG ?) requires insights that come from language design.

Oleg - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/19/2002; 5:21:46 PM (reads: 1456, responses: 0)
A co-worker of mine suggested the following XML format for GRIB data. GRIB is a World Meteorological Organization (WMO) format for gridded data -- e.g., values of air temperature over an equally spaced lat/lon grid, predicted by an MM5 weather model for 12Z tomorrow. GRIB is a binary and an extremely convoluted format. The latter is not the consequence of the former -- but is the consequence of a design by a large committee. Here's the proposed XML format:
	<GRIBDATA>
	  <BIT>0</BIT>
	  <BIT>1</BIT>
	  <BIT>1</BIT>
	  ...
	</GRIBDATA>
Gee, since XSLT is Turing complete, we can perform any possible transformation on a document in the above form.

For a bit more serious (and officially blessed) XML format for GRIB, see
http://zowie.metnet.navy.mil/~spawar/JMV-TNG/XML/JMGRIB.html

This example, and that of SVG, among many others, point out that designing an XML format is highly non-trivial. The non-trivial part is domain modeling. On one hand, XML makes things easier -- "standard serialization". OTH, XML makes the designer explicitly think about compactness. Note how careful SVG is to avoid or at least ameliorate excessive bloat.

I have to wade through umpteen different ways of doing parse trees. If there was some standard for parse trees, e.g. using XML/DOM there'd be one less thing to think about and I could spend more time working on the real meat, i.e. semantics, i.e. useful features for users. A parse tree is merely a syntax-sugar tool.

That's the real issue: portable representation for AST. That's the thrust of the ASDL project
http://asdl.sourceforge.net/
An AST can be printed out as XML, see
http://pobox.com/~oleg/ftp/Scheme/xml.html#executable-XML

It has become clear however that XML is a lousy notation for a real programming language
http://www.fawcette.com/xmlmag/2002_02/magazine/departments/endtag/

XML is aimed to be ASN.1 but more manageable. Alas, it is well on the same road ASN.1 has travelled, if XML Schema Part 1 (structures) is of any indication.
http://www.w3.org/TR/xmlschema-1/

For example: 5 Let [Definition:] the wild IDs be the set of all attribute information item to which clause 3.2 applied and whose ·validation· resulted in a · context-determined declaration· of mustFind or no ·context-determined declaration· at all, and whose [local name] and [namespace name] resolve (as defined by QName resolution (Instance) (§3.15.4)) to an attribute declaration whose {type definition} is or is derived from ID. Then all of the following must be true:
5.1 There must be no more than one item in ·wild IDs·.
5.2 If ·wild IDs· is non-empty, there must not be any attribute uses among the {attribute uses} whose {attribute declaration}'s {type definition} is or is derived from ID.
NOTE: When an {attribute wildcard} is present, this does not introduce any ambiguity with respect to how attribute information items for which an attribute use is present amongst the {attribute uses} whose name and target namespace match are ·assessed·. In such cases the attribute use always takes precedence, and the ·assessment· of such items stands or falls entirely on the basis of the attribute use and its {attribute declaration}. This follows from the details of clause 3. Whoever finds this excerpt from XML Schema Part 1 easy to understand, please raise your hands.

It's really the first-ever widely accepted standard for representation of structured data.

Note really: first there are tables (serialized, e.g., as comma-separated files). Any structured data can be represented as a set of tables. For high-volume data tables are still being widely used. Then there is Postscript, which, along with Lisp, Scheme and TeX, represents data as programs. Postscript and its relative PDF are huge successes. Finally, there is a RIFF format and its instances AIFF and WAV audio file formats and the TIFF image format. I should also mention more scientifically oriented NetCDF and HDF file formats.

None of these formats will be displaced by XML. It is truly insane to represent multi-gigabyte data from space science missions in XML. The fact these formats are "niche" is also their strength. They concisely model their intended domain. BTW, XML is also a "niche" standard -- for semi-structural, text data. XML grandparent, GML, was invented out of the need to represent in plain text "forms" (such as tax forms) and annotated documents such as legal briefs. For such applications, XML is nearly ideal. It's when we try to apply XML to domains such as programming languages, the representation of high-volume data, strictly structured data (as typically dealt with in database community) that XML drawbacks become more and more annoying.

True, XML made parsing of data representations easier: although not as trivial as many people seem to think. Parsing of XML is quite complex -- made even more complex by the evolution of the standard.

My experience indicates that an enormous percentage of XML currently in use does not in fact have a DTD, or if it does, the DTD was automatically generated from a sample document, not hand-written.

And what is the reason for using such a DTD at all? DTD is a manifest type of a document, a contract between a consumer and a producer of data. An explicit DTD specification relieves a client from the trouble of discovering elements and attributes at run time. If a DTD says that a 'Name' element must have a 'FirstName' element as the first child, the consumer can confidently retrieve the first child of any 'Name' element it encounters, without the need for an existence check or for error handling. If DTD is generated from a document itself, what is the use of it?

If anyone has any counter-examples, of existing technologies that could, in their currently existing state, replace XML in all its aspects, I'd love to hear them.

The modality of the question makes it easy. Of course there are: S-expressions, ASN.1, comma-separated-value tables, ASDL. The reasons many of these technologies "didn't make it big" has a lot to do with politics and marketing than with the technical merits. BTW, the question elicits another one: do we need such a generic syntax? Maybe a more domain-specific syntax turns out more preferable. Witness how bad XML is with high-volume data such as hi-resolution imagery. Maybe we need generic meta-syntax tools such as Zephyr rather than generic syntaxes?

Jay Han - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/20/2002; 2:36:51 PM (reads: 1453, responses: 2)
Even though I agree that XSLT has been a good vehicle to bring attention to transformation, XML itself is still a pretty weak tool - just observe controversies surrounding WXS, PSVI, processing model (or lack thereof), etc.

Ehud Lamm - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/21/2002; 12:11:18 AM (reads: 1510, responses: 1)
Could you perhaps give links to papers discussing these issues?

jon fernquest - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/21/2002; 3:43:47 AM (reads: 1381, responses: 1)
> Could you perhaps give links to papers discussing these issues?

Here is at least some terminological definitions:

WXS = W3C XML Schema

PSVI = post schema validation infoset

"The Infoset is essentially a distillation of the vital parts of an XML document after it's been parsed. According to Thompson, the Infoset leaves out the "uninteresting" parts of a document, such as whether attribute values use double or single quotes, the amount of whitespace outside of elements, and whether empty elements are written with one tag or two. Of course, Thompson noted, XML editing applications needed to know these things but XML processing applications don't.

Systems that operate on XML documents can be thought of as processing pipelines for infosets. When a document is parsed, an infoset is created, which may then be validated against a schema, after which the infoset is augmented with type information. The resulting infoset is called the "post schema validation infoset" or PSVI. The infoset may then have an XSLT transform applied to it, finally being serialized back to XML. In this world, the XML documents we are all used to, angle brackets and all, become merely hosts for the propagation of infosets." (Source)

> Even though I agree that XSLT has been a good vehicle to bring attention to transformation,
> XML itself is still a pretty weak tool - just observe controversies surrounding WXS, PSVI,
> processing model (or lack thereof), etc.

Some strong cathedral builder of a programmer will eventually come along and lead us out of the forest. Doubt whether committees ever will.

Does anyone know about mapping XML data to collections of objects (language dependent) using XML schemas and datatypes (language independent), using, for instance XPath to pick out subtrees, e.g. address book entries, to map to objects?

Interesting tidbit: "the Xml in Stratego signature is a like lot the classes of the DOM."

Jay Han - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/21/2002; 12:12:17 PM (reads: 1528, responses: 0)
A few links are collected in http://notes.antville.org/20020701/

See also eclectic a summary of XML-DEV mailing list. This week's dicussion is on processing model.

Jay Han - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/21/2002; 12:24:14 PM (reads: 1425, responses: 0)
I am not sure if "Some strong cathedral builder of a programmer will eventually come along and lead us out of the forest." The forest is growing bigger and bigger everyday.

Java/XML databinding has been around eg Castor, JAXB. For "picking out subtrees" see the following links. JXPath is XPath for Java objects and DOM. It is a part of Jex family of expression languages. See also http://notes.antville.org/topics/Development/68369/.

Jay Han - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/21/2002; 12:36:11 PM (reads: 1362, responses: 0)
Adam Bosworth (VP at BEA) write in A Programming Paradox
We need a language that can natively support XML as a data type and yet can gracefully integrate with the world of objects (Java or otherwise) and can take advantage of the self-describing nature of XML by supporting querying of its own variables. This language as used by humans will look like a programming language, not an XML grammar. This is the language we will use to convert from one XML format to another. This is the language we will use to synthesize complex XML documents for multiple sources and Web services. This is the language we will use to mediate between the world of XML messages and the world of Java or C# processes.

Anton van Straaten - Re: XLANG and WSFL: Syntactic Arsenic  blueArrow
7/21/2002; 1:44:09 PM (reads: 1379, responses: 0)
To address some points that Oleg raised:

In response to my claim that XML is "really the first-ever widely accepted standard for representation of structured data", I'll accept delimited tables as a primitive precursor, and NetCDF and HDF as niche alternatives. Postscript, PDF, xIFF, etc. are all too domain-specific to qualify.

Perhaps I should rephrase my claim: XML is the first data representation that has become sufficiently standard to achieve significant "network effects", and this is what makes XML so useful.

My experience indicates that an enormous percentage of XML currently in use does not in fact have a DTD, or if it does, the DTD was automatically generated from a sample document, not hand-written.
And what is the reason for using such a DTD at all?
Because many people develop XML formats informally, on an ad-hoc basis. Having come up with a format and some sample documents, they can then use a tool to generate a DTD from those documents, and that DTD then becomes the specification, with all the benefits that DTDs provide. A related benefit is that DTDs aren't actually essential, in many cases.

From a purely technical perspective, the ability to reliably infer a grammar from a document may seem like an irrelevancy; but I think that this has been quite a big factor in the usability and thus acceptance of XML.

If anyone has any counter-examples, of existing technologies that could, in their currently existing state, replace XML in all its aspects, I'd love to hear them.
The modality of the question makes it easy. Of course there are: S-expressions, ASN.1, comma-separated-value tables, ASDL.
Each of these is missing features that XML provides - human readability, self-descriptiveness, ability to infer a grammar from a document, etc. One could presumably define a standard based on S-expressions or delimited tables that provided all of these things along with other features of XML, but afaik, no-one actually has, so these things don't qualify as replacements "in their currently existing state".
The reasons many of these technologies "didn't make it big" has a lot to do with politics and marketing than with the technical merits.
Ah! This goes to the heart of what I'm getting at. Although politics and marketing certainly play a role, as do historical factors - such as the fact that HTML paved the way for acceptance of XML - there's a missing factor in many of these technically-focused discussions, which is the human factor.

Weird and non-obvious things like redundancy, self-descriptiveness, simplicity, and standardization can have a big effect on usability. Dismissing the success of technologies like XML (another one that comes to mind is Java) as purely the result of politics and marketing is ignoring lessons that these technologies can teach about what is useful in "the real world".

As I said originally, I don't believe that the technology currently exists to "free [Windley] from XML tyranny". Purely technically, perhaps it does, but the gap between technically possible and real-world viable is larger than usually admitted in these discussions.