Lambda the Ultimate

inactiveTopic XML Shallow Parsing with Regular Expressions
started 4/5/2003; 4:32:25 AM - last post 4/5/2003; 4:32:25 AM
Ehud Lamm - XML Shallow Parsing with Regular Expressions  blueArrow
4/5/2003; 4:32:25 AM (reads: 1069, responses: 0)
XML Shallow Parsing with Regular Expressions
(via Keith Devens)

The syntax of XML is simple enough that it is possible to parse an XML document into a list of its markup and text items using a single regular expression. Such a shallow parse of an XML document can be very useful for the construction of a variety of lightweight XML processing tools. However, complex regular expressions can be difficult to construct and even more difficult to read. Using a form of literate programming for regular expressions, this paper documents a set of XML shallow parsing expressions that can be used a basis for simple, correct, efficient, robust and language-independent XML shallow parsing. Complete shallow parser implementations of less than 50 lines each in Perl, JavaScript and Lex/Flex are also given.

Related to our on going "Native XML support" discussion.


Posted to xml by Ehud Lamm on 4/5/03; 4:33:13 AM