CDuce: a programming language adapted to the manipulation of XML documents
started 7/3/2003; 4:45:30 PM - last post 7/27/2003; 9:30:35 AM
|
|
Ken Shan - CDuce: a programming language adapted to the manipulation of XML documents
7/3/2003; 4:45:30 PM (reads: 1753, responses: 12)
|
|
CDuce: a programming language adapted to the manipulation of XML documents |
Our point of view and our guideline for the design of CDuce is
that a programming language for XML should take XML types (
DTD, XML Schema, Relax-NG, ...) seriously. The benefit are the following: - static verifications
(e.g.: ensure that a transformation produces a valid document);
- in particular, we aim at smooth and safe compositions
of XML transformations, and incremental programming;
- static optimizations and efficient execution model
(knowing the type of a document is crucial to extract information
efficiently).
Some of CDuce peculiar features:
- XML objects can be manipulated as first-class citizen values:
elements, sequences, tags, characters and strings, attribute
sets; sequences of XML elements can be specified by
regular expressions, which also apply to
characters strings;
- functions themselves are first-class values, they
can be manipulated, stored in data structure, returned by
a function,...
- a powerful pattern matching operation can perform
complex extractions from sequences of XML elements;
- a rich type algebra, with recursive types and arbitrary
boolean combinations (union, intersection, complement) allows
precise definitions of data structures and XML types;
general purpose types and types constructors are taken seriously
(products, extensible records, arbitrary precision integers with interval
constraints, Unicode characters);
- polymorphism through a natural notion of subtyping,
and overloaded functions with dynamic dispatch;
- an highly-effective type-driven compilation schema.
Preliminary benchmarks suggest that despite the
overhead for static type verification, a CDuce
program can run faster (30% to 60%) than an equivalent XSLT
style-sheet (we performed benchmarks with
the xsltproc tools from the Gnome libxslt library).
Posted to xml by Ken Shan on 7/3/03; 4:46:27 PM
|
|
|
|
Oleg - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/4/2003; 9:01:51 PM (reads: 1056, responses: 0)
|
|
Ok, I bite. I'm quite irritated by the following claims
- static verifications (e.g.: ensure that a transformation produces a
valid document);
because the above goal is not attainable short of proving the whole
transformation code with ACL2 or a similar system. I'm quite disturbed
when people replace definitions with the ones they find more
agreeable. In the context of XML, "valid XML document" means only one
thing -- the satisfaction of all validity constraints
specified in the XML 1.0 Recommendation. I must emphasize
all constraints. These constraints include not only
the content model constraints (e.g., that the element declared with empty
content must indeed has empty content). A valid XML document must
also satisfy the following constraints:
- Validity constraint: ID
- Values of type ID must match the Name [p.8] production. A name must
not appear more than once in an XML document as a value of this type;
i.e., ID values must uniquely identify the elements which bear them.
- Validity constraint: IDREF
- Values of type IDREF must match the Name [p.8] production, and values
of type IDREFS must match Names [p.8] ; each Name [p.8] must match the
value of an ID attribute on some element in the XML document;
i.e. IDREF values must match the value of some ID attribute.
- Validity constraint: Fixed Attribute Default
- If an attribute has a default value declared with the #FIXED keyword,
instances of that attribute must match the default value.
It's plain that to represent such constraints in a type system we must
have a fully value-dependent type system. People may have different opinions
about the merits of the above constraints and the wisdom of including
them. These opinions are ultimately irrelevant: until a new version of
the XML Recommendation rescinds those constraints, we are obliged to
follow them all if we would like to call our XML
documents valid.
Let us consider the following XML transformation problem: we have two
documents of a type
<!DOCTYPE Foo [
<!ELEMENT Bar EMPTY>
<!ATTLIST Bar Attr ID #REQUIRED>
]>
and we want to merge them (into a document of the same type). Clearly
we cannot write a transformation stylesheet that assuredly produces
the valid result for any valid input documents. The reason that the
values of the attribute 'Attr' must be unique throughout the whole
document. Two input XML documents may have unique values of the
attribute Attr. The result of their merger may have duplicates. We
are not at liberty to change the values of attributes to guarantee
their uniqueness: the values of attributes may have an independent
meaning (e.g., part codes, catalog codes, etc).
So, we must introduce dynamic, run-time checks. We must
formulate our transformation code type soundness theorem in the
following way: given valid input documents, the transformation code,
if it terminates without error, will generate a valid
document.
Now I jump up and say: I can give exactly the same kind of assurance without
any typing. I will transform input XML documents with SXSLT, and then
run the result through a validating XML parser. If the parser reports
no error, I'll send the result to the client. In both cases the client
is guaranteed to receive a valid XML document. Thus, in case of XML
transformations, static typing alone cannot give assurances that the
produced document is valid. As far as validity of the XML documents is
concerned, static typing doesn't seem to convey benefits over dynamic
typing.
The following message (posted on the SSAX-SXML mailing list) makes the
same point with more evidence and less anger:
http://sourceforge.net/mailarchive/forum.php?thread_id=1562776&forum_id=599
|
|
castagna - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/11/2003; 6:07:22 AM (reads: 920, responses: 0)
|
|
Thanks a lot for pointing out this point. Even if it is on our main page we overlooked it. Of course you are right and we should have specified valid w.r.t. all the constraints that can be expressed in the type system. Now it is done (is it ok?).
And what about the rest? I have to admit that you did not convinced me that "as far as validity of the XML documents is concerned, static typing doesn't seem to convey benefits over dynamic typing". A earlier detection of type errors helps me a lot in the development phase. But I confess I do not know SSAX-SXML so we should definitively have a look at it SSAX-SXML. Could you tell us a good starting point?
In any case besides static safety one of our point is that having types allows you to have a *very* fast implementation. This is expecially true in XML where if you know the type of your tree then cheking just the tag gives you a lot of information about what is below, and to avoid a lot of useless checks, and we do believe that CDuce proves it. In particular you see this expecially on computationally intensive executions (roughly for transformations in which the load&parse time is not much larger than the execution time). I warmly invite you to check it in person, by trying our beta release and compare execution times with your favorite XML processor. If you do please let us know you results, both negative and positives we are very interested in them.
Also if you have any question, remark, or suggestion we are eager to receive them. Please post it _also_ on one of our mailing lists (e.g. devel@cduce.org or users@cduce.org). And, you know, not all that's wrong was written wrongly on purpose ;-)
Cheers
---Beppe
P.S. I forgot, the Fixed Attribute Default constraint can be verified in CDuce as the type system has sigleton types (types that contain just one value).
|
|
Isaac Gouy - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/11/2003; 7:37:08 AM (reads: 914, responses: 0)
|
|
Now it is done (is it ok?)
font-size:8px
My poor old eyes struggle with fine-print this small, especially when it's a fixed pixel size while the other text is em ;-)
|
|
Sjoerd Visscher - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/11/2003; 8:23:57 AM (reads: 914, responses: 0)
|
|
Clearly we cannot write a transformation stylesheet that assuredly produces the valid result for any valid input documents.
But it is possible to detect that the code may produce invalid results, so the compiler could issue a warning. When the transformation stylesheet is changed to capture and resolve id conflicts, the compiler could detect that too, and can compile without error. I don't know enough about type systems to tell how feasible this is, but it would be a very useful feature to have.
|
|
castagna - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/11/2003; 10:14:33 AM (reads: 883, responses: 0)
|
|
>font-size:8px
>
>My poor old eyes struggle with fine-print this small
SORRY, on my 19' screen it looked nice, but now I am on my lapotop and I see how awful it is. Changed to font-size:80%. Better isn't it?
|
|
castagna - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/11/2003; 10:28:47 AM (reads: 898, responses: 0)
|
|
Clearly we cannot write a transformation stylesheet that assuredly produces the valid result for any valid input documents.
Of course we can (write the identity transformation and it will always transform valid results from any valid input document). No, what we cannot do is to statically decide for every possible transformation if it produces valid result for any valid input documents.
Now detecting it is out of reach of type systems. If you have to decide if a given attribute is unique and the values of attributes are calculated by some expressions this is equivalent to decide whether a given expression will return a given value. We can do good approximations with some static analysis techniques (and thus issue warnings if this may happen) but only dynamic check can verify it.
On the other hand type system can find very nasty errors. For instance look at this declaration
<!ELEMENT person (name,children)>
<!ELEMENT children (person+)>
<!ELEMENT name (#PCDATA)>
You see the problem? There is no document (without refs) that can be valid, as a person contains a sequence of childrens that contain a non empty sequence of persons, etc generating an infinite tree.
Let us write the same type in CDuce and look at the result (taken verbatim from the on line demo, no color added)
type Person = <person>[ Name Children ]
type Children = <children>[Person+]
type Name = <name>[PCDATA]
Warning at chars 57-76:
type Children = <children>[Person+]
This definition yields an empty type for Children
Warning at chars 14-39:
type Person = <person>[ Name Children ]
This definition yields an empty type for Person
Ok.
We paid special care in localizing errors and suggesting solutions. I can give you more significative examples if you want, but you can ust try it by yourself by picking the examples available on the on line demo and putting in them random errors.
Cheers
---Beppe
|
|
Oleg - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/18/2003; 1:46:30 PM (reads: 806, responses: 1)
|
|
When I write a factorial function in a statically typed language, and
it compiles, the compiler does not guarantee that the function gives
the right result (let's disregard dependent and undecidable type
systems as they are not common). The compiler will only guarantee that
my function, if it ever finishes without an error, will return
some integer. If the XML working group had included a
specialist in programming languages, he would have hopefully explained
the rest of the group the benefits of separating some aspects of data
(the ones that can be statically checked) from the other
aspects. Perhaps the group would have introduced a category of
"structurally valid XML documents" that is not burdened with "ID
uniqueness" validity constraints. I must note that RELAX/NG
does include such a category and does draw the separation
between statically-checkable and the other aspects of XML documents
[1]. It seems James Clark learned a lot from XDuce.
There is an important aspect however that sets XML processing apart:
the aspect of trust and reliability. The compiler and the typechecker
implicitly trust the run-time system. If they compile my monomorphic
factorial function, the function's code can be sure it will be given
one and only one argument, and that argument will be an integer. The
function will never receive a "half" of an integer. These assurances
do not generally apply to XML processing, where a producer and a
consumer of an XML document are generally separate entities,
communicating via an untrusted and unreliable medium. If you write an
XML processing application that takes XML documents of doctype A, and
I give you an XML document that claims to be of a doctype A, would you
take my word for it? In this day and age, it would be quite unwise to
do so. Therefore, you would need to parse and validate my
document against the document type declaration for doctype A. You
write: "This is especially true in XML where if you know the type of
your tree then checking just the tag gives you a lot of information
about what is below, and to avoid a lot of useless checks, and we do
believe that CDuce proves it." Indeed, after you verified
that the incoming document is of the right doctype, you already know a
lot about the document, and the processing application proper can take
advantage of that information and avoid useless checks. However,
before your application gets to work you must first do the full
validation parse of the incoming document. To validate that the
document is of the right doctype you have to check the entire document
-- even the parts that your processing application might not care
about. If a processing application repeatedly traverses the whole
document, then the cost of the full validation check diminishes in
comparison with the processing cost. If the processing application
does a single scan (of a part) of a document, the following optimistic
scenario might be preferable.
Verifying the doctype of incoming documents is a pessimistic
scenario. Allow me to suggest an optimistic, "untyped" alternative,
which might work better in some circumstances. Suppose my processing
application extracts several pieces of data from some elements and
attributes of an incoming XML document. I won't care what the type of
the incoming document is. I parse the document without validation, and
get to work. Of course, I have to check that all the need pieces of
data are indeed present. If they are, I finish successfully. If I miss
something, I roll-back and signal an error. If I only scan a part of
the document, I don't need to check the parts I don't care about. Thus
the total number of checks (parsing + processing) may be fewer in the
optimistic scenario than that in the pessimistic one. Furthermore, for
some transformations, I may be able to do the processing as I parse
the document. SSAX parser [2] is especially conducive to such an
approach. Therefore, I may be able to process the document in a
stream-wise fashion. I can receive the document from one pipe and send
the transformed document through another pipe, without any need to
store the whole document in memory. This processing mode, when it
applies, dramatically reduces the processing latency and lets me
handle huge documents that won't fit into memory. Regarding the
roll-back and error signaling: HTTP 1.1 conveniently provides for
sending of out-of-band notifications to a client. In the best case, my
client will receive the result with little latency. In worse cases, I
simply tell the client to forget what I have said during the current
transaction, and roll back.
The optimistic approach has another advantage. Suppose my application
is designed to handle documents of the following structure
<!DOCTYPE Foo PUBLIC "T1" [
<!ELEMENT A EMPTY>
<!ELEMENT Foo (A)+>
]>
That is, I expect that the root element is named 'Foo' and it contains
one or several children named 'A'. Suppose you have sent me a document
of a type T2:
<!DOCTYPE Foo PUBLIC "T2" [
<!ELEMENT A EMPTY>
<!ELEMENT Foo (A)*>
]>
If I am using a pessimistic, statically typed approach, I should
reject your document: T2 is not a subtype of T1 (it's actually the
other way around). Documents of a type T2 may have no elements named
'A' in the content of the root element. However, there may be some
(many?) documents of type T2 that will contain at least one element
'A' as a child of 'Foo'. If the particular document you sent me is
indeed of that class, I can handle it. The optimistic, "dynamically
typed" approach so to speak, permits me to be opportunistic. I hasten
to add that whether being opportunistic is a good thing depends
entirely on application. For data mining, being opportunistic is
good. OTH, if an incoming XML document is an order to perform a
serious action, the application is better be sure the order is
authentic. The full validation of the incoming document is one step in
authentication.
This brings us to the point that validation is far trickier an issue
than it may appear. In fact, a consumer and a producer can use
different schemas to validate the document against. James Clark talks
at some length why a consumer or an author might need different
schemas, and about the complexities of validation [3]. For that
reason, RELAX/NG does not provide a "standard" way of associating a
schema with an XML document. The type of an XML document is in the eye
of the beholder. There may be several beholders, not related by any
particular hierarchy.
I should also point out that even well-formedness is far less clear
cut than it may appear. We saw that some applications might wish to
accept invalid XML documents -- or the documents with a wrong manifest
type -- as long as the document contains all the data the application
needs. One can make a case for an application that accepts even
ill-formed XML documents. For example, let's consider processing of
invoice records, which come in the following long XML document:
<Invoices>
<Invoice trans_id='1' cust_id='2' product_id='3'/>
<Invoice trans_id='2' cust_id='22' product_id='23'/>
...
My process is to load invoices into a relational database. Suppose I
receive the document in question from an HTTP pipe. In the optimistic
scenario, I can start processing right up-front, loading the data in
the database as I parse them. Suppose I processed all the records, got
to the end of the document. And then, when I was about to read the
closing </Invoices> , the network connection broke
down. Stuff happens. What should I do? Should I roll back what I have
loaded? One may argue that I should: had I downloaded the document
first, I would have noticed that it is ill-formed, and rejected
it. One may also argue that each invoice comes in its own transaction,
so I should keep the loaded records.
These examples show that there are several, tenable points of
view on the validity and typing of XML documents. I believe it's
important to be aware of the variety, so we can choose approaches that
fit best our problems.
But I confess I do not know SSAX-SXML so we should definitively have a
look at it SSAX-SXML. Could you tell us a good starting point?
http://ssax.sourceforge.net/
You might be particularly interested in papers about SXSLT.
[1] "The RELAX NG TC spent a considerable amount of time considering
what support RELAX NG should provide for enforcing identity
(uniqueness and cross-reference) constraints. In the end, the
conclusion was that identity constraints were better separated out
into a separate specification. Accordingly, RELAX NG itself provides
no support for identity constraints.... Another reason is that it is
often desirable to perform grammar processing separately from identity
constraint processing. For example, it may be known that a particular
document is valid with respect to a grammar but not known that it
satisfies identity constraints. The type system of the language that
was used to generate a document may well be able to guarantee that it
is valid with respect to the grammar; it is unlikely that it will be
able to guarantee that it satisfies the identity constraints. A
document assembled from a number of components may guaranteed to be
valid with respect to a grammar because of the validity of the
components, but this will often not be the case with identity
constraints. Even when a document is known to satisfy the identity
constraints as well as be valid with respect to the grammar, it may be
necessary to perform identity constraint processing in order to allow
application programs to follow references."
James Clark: The Design of RELAX NG. December 6, 2001.
<http://www.thaiopensource.com/relaxng/design.html>
[2] SSAX XML parser
http://ssax.sourceforge.net/
In particular, the PADL02 paper.
[3] "By layering type assignment on top of
RELAX NG validation, applications that do not require type assignment
do not need to pay the cost for a feature that they do not use. It is
also more flexible. For example, it allows there to be two schemas for
a document: one strict schema that captures as many constraints as
possible but does not satisfy the restrictions necessary for type
assignment and another looser schema that cannot express all the
constraints but which can be used for type assignment. One advantage
to not performing type assignment with RELAX NG is that RELAX NG works
well with existing APIs such as DOM and SAX. Type assignment would
require major changes to report the types assignment to elements and
attributes.
With XML 1.0, the XML document uses a DOCTYPE declaration to identify
the DTD with respect to which it is valid. There is no provision for a
document to be validated with respect to DTD that is specified
independently of the document. This is unsatisfactory for
interchange. When a document recipient receives a document from an
untrusted sender, the recipient may need to check that the document is
valid with respect to a particular DTD. The recipient cannot assume
that the DOCTYPE declaration of the document correctly identifies that
DTD. The recipient may want to validate against a DTD different from
that used by the author: for example, the recipient may validate
against a generalized, public DTD, whereas the author may validate
against a restrictive, private DTD that is a subset of the public
DTD. Unlike XML 1.0, RELAX NG does not tie a document to a single
schema. The RELAX NG validation process has two inputs: a document and
a schema against which to validate the document.
In fact, RELAX NG does not define any mechanism for associating a
document with a RELAX NG schema. Although it is useful to be able to
specify rules for determining the schema to be used to validate a
particular document, this problem is not specific to RELAX
NG. Validation is just one of many processes that can be applied to an
XML document. For example, a user may wish to perform XInclude [21]
processing or XSLT processing. A user may wish to perform validation
before or after any of these other processes. The problem of
associating a schema with a document is really just a special case of
the problem of associating processing with a document. What is needed
is a solution that can specify a series of processes to be applied to
a document."
James Clark: The Design of RELAX NG. December 6, 2001.
<http://www.thaiopensource.com/relaxng/design.html>
|
|
wh - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/22/2003; 2:16:37 AM (reads: 731, responses: 1)
|
|
I'd like to understand the motivation for overloaded functions in CDuce. It seems that the examples using overloaded functions in the white paper could just as well be expressed with mutually recursive functions (with different names). Are they simply a matter of convenience, or is there something specific about the way XML types are expressed/extended that requires overloading (and extensible overloading -- section 3.5 "incremental programming"). Thanks,
Warren
|
|
Alain Frisch - Re: CDuce: a programming language adapted to the manipulatio
7/26/2003; 12:05:41 PM (reads: 708, responses: 0)
|
|
Hi,
Thank you Oleg for the interesting remarks.
I don't quite understand the distinction you make between the optimistic (untyped) and the pessimistic (typed) approaches. It is possible in CDuce to validate the input document against a type that constraints only partially the structure of the document. For instance, validating an XML document against the type <a>[ <b>Any <c>String ] will not check anything under the tag <b>. The application can still perform "untyped" computations on the content of the b element by explicit pattern matching.
You also seem to imply that with the CDuce approach, the application necessarily has to load the whole XML document in memory. It is true that the current implementation loads the whole document in memory, but I see this as an implementation issue. It would be possible to implement the load_xml operator lazily, so as to parse the XML document on demand (this can be useful if validating against a "partial" type). A step further would give the whole language a lazy semantics, thus paving the way to streaming processing.
I don't understand your point in the example with types T1 and T2.
Why would the application reject the document with doctype T2 ? In CDuce at least, the application would reject the document only if it is empty. Indeed, CDuce does not use the doctype declaration when loading an XML document; instead, it performs validation on its content (a possible optimization would be to avoid performing the validation if the doctype is a subtype of the expected type and
if the document comes from a trusted source).
|
|
Alain Frisch - Re: CDuce: a programming language adapted to the manipulatio
7/27/2003; 1:24:40 AM (reads: 700, responses: 0)
|
|
[ note: answer also posted to the users@cduce.org mailing list. I propose to continue discussion of technical points specific to CDuce there. Click here for instruction on how to suscribe. ]
I'd like to understand the motivation for overloaded functions in CDuce.It seems that the examples using overloaded functions in the white paper could just as well be expressed with mutually recursive functions (with different names).
You're probably referring to the "split" function ("sort" in the
white paper - note that our ICFP03 paper to be presented is an updated
version of the white paper, available at http://www.cduce.org/papers.html).
type Person = FPerson | MPerson
type FPerson = <person gender = "F">[ Name Children ]
type MPerson = <person gender = "M">[ Name Children ]
type Children = <children>[Person*]
type Name = <name>[ PCDATA ]
type Man = <man name=String>[ Sons Daughters ]
type Woman = <woman name=String>[ Sons Daughters ]
type Sons = <sons>[ Man* ]
type Daughters = <daughters>[ Woman* ]
let split (MPerson -> Man ; FPerson -> Woman)
<person gender=g>[ <name>n <children>[(mc::MPerson | fc::FPerson)*] ] ->
let tag = match g with "F" -> `woman | "M" -> `man in
let s = map mc with x -> split x in
let d = map fc with x -> split x in
<(tag) name=n>[ <sons>s <daughters>d ]
The function could actually be expressed using mutually recursive
functions, like:
let split_common([ Person* ] -> [ Sons Daughters ])
[ (mc::MPerson | fc::FPerson)* ] ->
let s = map mc with x -> splitM x in
let d = map fc with x -> splitF x in
[ <sons>s <daughters>d ]
let splitM (MPerson -> Man)
<person>[ <name>n <children>c ] -> <man name=n>(split_common c)
let splitF (FPerson -> Woman)
<person>[ <name>n <children>c ] -> <woman name=n>(split_common c)
let split (Person -> Man|Woman) MPerson & x -> splitM x | x -> splitF x
To avoid duplicating code between splitM and splitF, we have to introduce
a function split_common. We also introduce a function split to recover the
fact that we want to apply the function to Person elements. Here, it is
easy to factorize common code, but it is not necessarily the case.
Another use of overloading is to give more precise information about the
result, when one has more information about the argument. Simple minded
examples:
type Nat = 0--*
let succ (Int -> Int; Nat -> Nat) x -> x + 1
Here, we would have to completely duplicate the body of the function:
let succInt (Int -> Int) x -> x + 1
let succNat (Nat -> Nat) x -> x + 1
Suppose we want to use succ like this:
type T = [ Int* Nat* ]
let succSeq (s : T) : T = map s with x -> succ x
If we want to use succInt/succNat instead, we have to dispatch between Nat
and Int in the caller:
let succSeq (s : T) : T = map s with x & Nat -> succNat x | x -> succInt x
But this dispach is completely useless, because the dynamic behavior is
the same in both cases. Now replace Nat and Int by complex XML types,
and x -> x + 1 by a complex transformation, and you get the picture.
There are also probably interesting examples combining overloaded
functions and higher-order functions, but we haven't found a convincing
one yet.
Are they simply a matter of convenience, or is there
something specific about the way XML types are expressed/extended that
requires overloading (and extensible overloading -- section 3.5
"incremental programming").
I let Beppe answer this part.
|
|
castagna - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/27/2003; 9:29:23 AM (reads: 644, responses: 0)
|
|
just let me add an example to what Alain said. The function
type XMLdoc = <_>_
let f(XMLdoc -> XMLdoc) <m t> e -> g(m,t,e,f)
is a generic function that manipulates any well-formed XML documents and
returns another wellformed document. Well-formation here is the only requirement
(and the only property enforced on the result).
|
|
castagna - Re: CDuce: a programming language adapted to the manipulation of XML documents
7/27/2003; 9:30:35 AM (reads: 661, responses: 0)
|
|
Are they simply a matter of convenience, or is
there
something specific about the way XML types are expressed/extended that
requires overloading (and extensible overloading -- section 3.5
"incremental programming"). Thanks,
I let Beppe answer this part.
[posting on users@cduce.org: see previous message]
Ok, so let me add my two cents worth. This is an old idea, that is to use
dynamic overloading to mimic object orientation. The rough idea is that
code specialization can be obtained by incremental definition of overloaded
functions. The act of adding a new method for a message msg, corresponds
to add a new code to (i.e. a new arrow in the interface of) the overloaded
function msg.
To exemplify let us consider the split function of our example. Imagine that
we define a subtype of person in that contains also a nationality attribute:
type Citizen = FCitizen | MCitizen
type FCitizen = <person gender = "F" nat = String>[ Name Children ] type MCitizen = <person gender = "M" nat = String>[ Name Children ]
note that the Children are still persons so no nationality is recorded. Imagine
now that we want to specialize the behavior of split. As it is split
works also with Citizens but it completely loose the information about nationality.
To make things harder imagine that instead split has to register the nationality
in the result type and propagate it to children as well, that is.
type CMan = <man name=String nat=String>[ <sons>[CMan*] <daughters>[CWoman*] ]
type CWoman = <woman name=String nat=String>[ <sons>[CMan*] <daughters>[CWoman*] ]
Of course we can redefine split from scratch but by stating binding the new
behavior will be available only to the new functions. If instead we implement
specialization with dynimic binding then also the "old" calls to split will
enjoy the specialized behavior.
so if split were define in the module SomeModule we could write (in a yet
not existing syntax)
addmethod SomeModule.split (FCitizen -> CWoman; MCitizen -> CMan)
<person gender=g nat=nn>[ <name>n <children>[(mc::MPerson | fc::FPerson)*] ] ->
let tag = match g with "F" -> `woman | "M" -> `man in
let s = map mc with <person (atr)> elms -> split <person (atr+{nat=nn})> elms in
let d = map fc with <person (atr)> elms -> split <person (atr+{nat=nn})> elms in
<(tag) name=n nat=nn>[ <sons>s <daughters>d ]
[we add the nat attribute in the arguments of the recursive calls and in
the result]
This would roughly correspond to transform all the existin call to split
to a call to the following function:
let split (MPerson -> Man ; FPerson -> Woman ; FCitizen -> CWoman ; MCitizen -> CMan)
| <person gender=g nat=nn>[ <name>n <children>[(mc::MPerson | fc::FPerson)*] ] ->
let tag = match g with "F" -> `woman | "M" -> `man in
let s = map mc with <person (atr)> elms -> split <person (atr+{nat=nn})> elms in
let d = map fc with <person (atr)> elms -> split <person (atr+{nat=nn})> elms in
<(tag) name=n nat=nn>[ <sons>s <daughters>d ]
| <person gender=g>[ <name>n <children>[(mc::MPerson | fc::FPerson)*] ] ->
let tag = match g with "F" -> `woman | "M" -> `man in
let s = map mc with x -> split x in
let d = map fc with x -> split x in
<(tag) name=n>[ <sons>s <daughters>d ]
Of course there still is a long way to go, first we need a real module system.
---Beppe---
|
|
|
|