Lambda the Ultimate

inactiveTopic Tuples + Objects + Infosets =Too Much Stuff!
started 10/2/2003; 5:59:24 AM - last post 10/17/2003; 1:48:53 PM
Ehud Lamm - Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 5:59:24 AM (reads: 13925, responses: 46)
Tuples + Objects + Infosets =Too Much Stuff!
Once upon a time it was possible for every new programmer to quickly learn how to write readable programs to Create, Read, Update and Delete business information. These so-called CRUD applications, along with reporting, were pervasive throughout business and essentially defined IT or MIS as it was called in those days... [Now] THINGS ARE SO COMPLEX YOU NEED AN M.SC. TO PROGRAM CRUD!

Eric says this is a great paper, so who am I to argue?


Posted to OOP by Ehud Lamm on 10/2/03; 6:01:43 AM

Chris Rathman - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 6:34:29 AM (reads: 1788, responses: 2)
The author blames the proliferation of paradigms and frameworks for the complexity in databases. Then goes on to propose that we come up with even more things to learn in order to solve the problem.

Well, if we just dig more, we'll eventually get out of this hole. :-)

Ehud Lamm - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 6:48:11 AM (reads: 1786, responses: 1)
Do you really think the problem is that we have to many paradigms and frameworks, or the fact that none of them actually offer a good solution to the technical difficulties?

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 6:55:31 AM (reads: 1768, responses: 0)
I have not read the paper yet. I am dubious about the subheading "All You Need Is Infosets", so I look forward to the details.

As for the author, I give Dave a lot of credit. I've met and heard him at OOPSLA as well as at a Gemstone company-wide retreat where he gave a very challenging keynote. (Which was ignored, BTW. Look where that got us.)

I do think in IT a major source of complexity is the ever evolving myriad of languages and tools. On the other hand I think we need better design guidelines for specific purposes and scales. And I think we should expect these guidelines to change very much over the next few years as hardware capabilities improve in interesting ways.

Chris Rathman - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 7:21:47 AM (reads: 1762, responses: 0)
Do you really think the problem is that we have to many paradigms and frameworks, or the fact that none of them actually offer a good solution to the technical difficulties?
My immediate problem is that I have to learn and retain an immense amount of trivia in order to get anything useful accomplished. My longer term problem is that the types of programs that I write are doing increasingly complex chores - changes in tools are required to keep up.

But if you were to ask me about the real underlying problem with our profession is the lack of clearly defined lines of specialization. Yes, translating from tuples-to-objects-to-infosets is a complex process, but why is it that it is something that practically every programmer has to fret about? Where's the specialists who's only job is to do that exact chore, to let the rest of us go our own way?

I don't think it's necessarily about developing the right tool to solve all our problems. It's about developing a whole industry of tools for the various lines of specialization that need to form in the field of software engineering.

Sjoerd Visscher - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 7:44:09 AM (reads: 1764, responses: 3)
I actually believe in "all you need is objects".

Tuples are good for quick hacks, but really obfuscate the code, because the meaning of each item in a tuple is implicit. Also I find it hard to remember the order of things in a sequence. I know almost all of the browser DOM and CSS by heart, but I keep forgetting the order of arguments in functions like setTimeout and insertBefore, or the order of properties in the border shorthand.

The problem with objects is that the theory is really underdeveloped (or invisible). Hardly any OO language has basic object operators, f.e.:

[a:1, b:2] over [b:3, c:4] => [a:1, b:2, c:4] Really useful for implementing prototype based inheritance, or environments. (see the ext function in Definitional Interpreters for Higher-Order Programming Languages, http://lambda-the-ultimate.org/classic/messagx8879)

[a:1, b:[c:2]] & [b:[d:3]] => [a:1, b:[c:2, d:3]] I saw this one in Forsythe's type system. http://citeseer.nj.nec.com/reynolds96design.html

These operators might clash with what a lot of people see as OOP, but they are really useful. Think f.e. AOP and traits.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 8:08:52 AM (reads: 1735, responses: 0)
Think f.e. AOP and traits.

I think AOP is a solution looking for a problem. Simple notations like Smalltalk, as is, are suitable for most IT problems. I think it's great that people are looking at type systems, AOP, etc. in the applied lab, but most problems I have encountered need no such tools today.

What is needed is a domain model on the order of SAP implemented in a notation as simple as Smalltalk by a competent designer. Then what is needed is a way to go into large IT shops and replace entire systems overnight.

IT is a social disease.

Sjoerd Visscher - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 8:25:10 AM (reads: 1729, responses: 0)
In BeyondJS there was a clear use for traits: f.e. we created a lot of methods that worked by only using the foreach method. That's a very good usecase for a Collection trait: implement foreach and include the Collection trait and your class automatically a whole range of collection methods.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 9:59:46 AM (reads: 1670, responses: 0)
Tuples are good for quick hacks, but really obfuscate the code, because the meaning of each item in a tuple is implicit. Also I find it hard to remember the order of things in a sequence.

BTW, I think Dave is specifically referring to relational databases, even SQL, when he uses the term "tuple" in this paper.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 10:01:22 AM (reads: 1669, responses: 0)
That's a very good usecase for a Collection trait: implement foreach and include the Collection trait and your class automatically a whole range of collection methods.

I'm not saying these things are not useful. I am saying they are not relevant to the current real problem in IT.

Isaac Gouy - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 10:03:51 AM (reads: 1667, responses: 0)
Eric says this is a great paper
It's a clear, opinion piece that presents 3 choices:
  • muddle-on with the mismatch between SQL, OO & XML
  • hide the mismatch by generating boiler plate code
  • deal with the mismatch by designing more capable languages

and takes us to familiar ground "Unifying Tables, Objects and Documents"; XDuce; XStatic.

Alex Peake - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 4:01:48 PM (reads: 1559, responses: 1)
Chris Rathman says:
Where's the specialists who's only job is to do that exact chore, to let the rest of us go our own way?


I think you are on the mark here! Relational Algebra and/or Tuple Relational Calculus are simple concepts to those that have learned

Similarly the processes of computing with the results of a Relational Database "result set" is straightforward.

Interestingly, it is mostly "formula" stuff, so even if you do not have an "expert" you could reasonably have an "expert system" to create this code for you and let you "move on".

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/2/2003; 4:39:55 PM (reads: 1576, responses: 0)
Do you really think the problem is that we have to many paradigms and frameworks, or the fact that none of them actually offer a good solution to the technical difficulties?

Definitely too many frameworks solving imaginary problems.

But the more fundamental problem is that we spend too much time trying to find technological solutions to human problems. (The inverse of the human factors argument. ;-) )

There is no framework or standard or metadata scheme that will ever solve the problem that different people want different things at different times and can't describe what those things are most of time.

So all the great "standards" and frameworks that are supposed to give us all the flexibility we want just make it more and more complex to solve simple problems.

I think this is one of the fundamental strengths of a general purpose PL. If (mostly) gives you consistent, well-defined mechanisms to derive the solution that you actually need without necessarily piling on the cruft.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/3/2003; 9:49:48 AM (reads: 1424, responses: 2)
Some thoughts, not all coherent, reading through the paper...

"SQL is quite good for simple CRUD applications on normalized tables."

This seems to speak to OLTP. For OLAP, denormalized tables (3NF fact tables and 2NF dimension tables in a star schema) would be preferred. Still standard SQL does not support all the expressions you'd like in OLAP such as time series expressions.

For OLTP I am not convinced you want SQL at all. Something like Prevayler might be preferred. When we get large, battery-backed RAMs in a few years, we won't even care about writing transactions to disk.

"SQL programming often requires an alternative interface using cursors"

This is becoming somewhat less necessary in situations where set-based expressions are the ideal. Some databases like Teradata and Sybase IQ support set-based expressions efficiently. Even SQL Server is better at this than in previous versions.

"after many years of engineering, the relational databases can finally claim the performance and flexibility of keyed files...; network databases..."

Henry Baker has some great thoughts about this. I am kind of in the middle. One thing seems to be true, that funding for any kind of database other than relational is almost nothing. Object databases have had commercial funding, but they've been miniscule compared to the commercial relational database R&D.

What, for example, could have been done at Gemstone where indexing, query, and reporting for its OODB had well under one person year R&D during it's 20 years of development?

This has some applicability to XML too. Is XML a "random access database"? Or a "serialization" (with "includes"? with "pointers")?

"Third Generation Database Manifesto... objects... were syntactic extensions on Blobs"

Another approach in PostgreSQL and other DBs is to make tables like a "class" (whatever that is!) and one class/table can inherit from another. This is actually fairly useful for O/R mapping.

"Object databases, it was claimed, solved the impedence mismatch..."

Another note on star schemas, they simplify data models relative to 3NF models, and they partition data into dimensions, facts, and many-many relationships. Dimensions map fairly well into objects, facts map into observations or measurements among networks of objects. If you design your objects and your data with this in mind, the O/R mapping problem can be reduced for many common business (and other) scenarios.

"while there are some solutions (AS/400 and Gemstone persistent stores) that have been very successful..."

Dave gave a keynote at a Gemstone company retreat. He tried to marry Gemstone with AS/400, suggesting we could ignore the Java industry and make more money. I tend to believe him since AS/400 was already a successful niche with persistent data as a feature, and Dave was at the time with IBM (via his OTI subsidiary) and so had to have had some inside understanding of the economics.

This was the point where Gemstone in "the hopes of becoming the next Oracle" all but abandoned Smalltalk for Java/J2EE. For the next several years the Smalltalk market funded the Java development with about 3x developers for Java than ever worked on Smalltalk. I doubt the Java investment ever broke even, while Smalltalk continued to bring in revenue (at least as of a year or so ago).

As mentioned above, Gemstone hardly invested anything in query, indexing, and reporting for either Smalltalk or Java OODBs. Had the numbers assigned to Java been put into this, and perhaps the AS/400 port, not to mention the replication mechanism and servlet-like multiplexor which had just been developed on a shoestring, what could have been the result?

What if these had been developed and Gemstone purchased by IBM, which had been discussed many times even on Gerstner's floor in IBM?

"the brave new world of XML schemas and Infosets"

We'll see. Not too many business systems have been built on these yet. As mentioned above, it is not clear that XML is a random access database or a serialization or something else altogether. Nor is it clear where "includes" and "pointers" fit in. And what is a "relationship" in XML as in the relational database sense? Not entirely clear.

"It can be argued that given the ability to directly query both relational and XML data one can handle lots of problems without needing objects."

Objects are for abstractions. So are functions. So the comprehensiveness of the above statement depends on what "query" means and it depends on the query language.

"the lack of explicit XML values..."

This gets back to what is XML vs. some use of XML. Should there be one "data model" for XML? I doubt it.

"The impedence of incompatible type systems imposes..."

Everything is incompatible (e.g. "computation" and "data model" as well as "type"). An approach to some of the concerns in this paper may be better off *ignoring* XML(!), and going more into left field for potential solutions. Then those solutions may be able to be mapped back into XML for some purposes.

What is it with XML? We have some relatively primitive yet widespread tools for XML. But should this suggest our future data model, search, and computation problems are best solved "using XML", whatever myriad of mechanisms that means?

Dominic Fox - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/3/2003; 3:07:38 PM (reads: 1371, responses: 1)

The argument of Fabian Pascal, Chris Date et. al. seems to be that relational databases have simply never been done right, and that most of the proposed solutions to the deficiencies of SQL databases rather add to the problem instead of remedying it to any degree.

Now I don't know. I only looked up the expression "transitive closure" fairly recently, and lo if it wasn't the name of something I was getting increasingly annoyed with SQL Server for apparently not being able to do; but I believe it can't (as Henry Baker asserts in the article referenced by Patrick's link, above) be expressed using the basic constituents of Codd's relational calculus either.

Date and Darwen's Third Manifesto states that D, their fully relational replacement for SQL, should have some convenient means of expressing transitive closure, and that's nice; but actually the ability to express queries that can do CRUD for graphs (e.g. trees) and bits of graphs (e.g. subtrees) is quite a big deal, and it does seem to be *missing* from the foundations of the relational model. That's the deficiency - well, one of the deficiencies - that OO/Relational "mapping" has to deal with, as objects - at least those that maintain pointers to other objects - tend to be laid out in just the sorts of graphs (e.g. trees) that it's difficult to deal with without transitive closure. I believe it's the same deficiency that makes some people turn to XML (although as soon as you want anything that isn't a straightforward tree you're back to ids and idrefs and might as well be working with tables anyway).

xeo_at_thermopylae - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 12:29:10 AM (reads: 1319, responses: 2)
I actually _know_ that all you need are relations.

Objects are good for quick hacks, but really obfuscate the code, because the meaning of each object is implicit. If I find it hard to remember the order of things in a sequence, I can define a relation for each (thing).

The problem with objects is that any possible theory (if indeed any theory at all is possible) would be more clearly stated in terms of relations. But it is probably impossible to have a theory of object databases anything like relational database theory.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 6:41:39 AM (reads: 1279, responses: 0)
I actually _know_ that all you need are relations.

Objects are good for quick hacks, but really obfuscate the code...

This is an interesting claim. What size systems has this been tested on empirically?

xeo_at_thermopylae - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 7:03:25 AM (reads: 1288, responses: 0)
Alex Peake says:

I think you are on the mark here! Relational Algebra and/or Tuple Relational Calculus are simple concepts to those that have learned

Do you mean that those are difficult to learn? Reason I ask is that the remainder of your post seems to imply that it is easy (which I believe is so).

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 7:28:07 AM (reads: 1287, responses: 1)
I actually _know_ that all you need are relations

Hmmm. I think that this is true the same way that it is true that "all you is need is assembly".

Sure, any program you want can be done in assembly, but that doesn't mean it is convenient or manageable for all tasks.

xeo_at_thermopylae - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 8:22:29 AM (reads: 1299, responses: 0)
It means that relational systems can provably cover a problem domain, whereas there is no provability for OO. Indeed there is no mathematical theory of OO whatsoever.

Many developers see relational theory as old and therefore infer that it is therefore unnecessary or defunct. But the truth is that relational theory is the only database theory available. It works and works well. Relational databases dominate the marketplace and will continue to do so.

SQL is a weak implementation of relational theory. Prolog is an excellent implementation. Sadly most developers lack the patience to learn even SQL; there is no hope that they could ever master Prolog.

As to the article: adding layers will not solve a problem. Also attempting to reframe a system in "Objects + Infosets" instead of "tuples" will only work in limited cases (and will result in a brittle structure that requires extensive modification when changes are made). That's why the relational model was developed, but much of that is unknown to developers today. Those of us who know will just have to wait until enlightenment strikes a generation of developers.

Most systems today can be written using a development system that relies on "tuples". For the most part it's a waste of resources and time to add additional layers of "Objects + Infosets". But it's the current craze. FWIW here's Erik Naggum's take on XML - the seed standard that originated all this cruft.

Alex Peake - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 8:59:43 AM (reads: 1261, responses: 0)
xeo asks:

Do you mean that those are difficult to learn?

No, not difficult, it just takes the motivation.

For an implementation of the language "Tutorial D" from Date & Darwin see http://www.alphora.com/

Ehud Lamm - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 9:17:27 AM (reads: 1291, responses: 0)
Perhaps we should focus this discussion and note that we are interested in programming language design, not database design.

And that makes all the difference...

Dominic Fox - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 2:21:54 PM (reads: 1233, responses: 0)

The question, IIRC, was what tools (i.e. PLs) might help us to resolve the various messes that SQL and the various kludges layered on top of it recurrently get us in to. One view on this is that more tools - more layers, more kludges, more messes - isn't the answer. It's too late to start looking for elegant mappings of the domain and trying to encode these mappings in programming tools, because the fault is aboriginal and needs to be corrected at its origin.

D4 / Tutorial D looks really interesting to me, because it looks like an attempt to do just this; moreover, it does (in Alphora's implementation) actually retrofit over existing SQL DBMSes, so there's no need to tear everything down before you start building again. There is, as I've mentioned before, a fairly thorough discussion of the language on Alphora's site - it's only a shame that there isn't also a public domain / free software implementation available at present.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 4:43:05 PM (reads: 1209, responses: 0)
FWIW here's Erik Naggum's take on XML - the seed standard that originated all this cruft.

I have not been able to sit still long enough to parse the thoughts about XML in that item. (My printer is temporarily disconnected from my network; I'd have to print that item and diagram it to follow the logic.) I scan through run-on paragraphs containing sentences about robbery, rape, Enron, and the GOP. Maybe you could summarize the essence?

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 4:47:01 PM (reads: 1206, responses: 0)
Perhaps we should focus this discussion and note that we are interested in programming language design, not database design.

And that makes all the difference...

I disagree with this dichotomy. Within a few years we'll start to see good sized systems with battery backed RAM and no disk. With that architecture, what's the difference between a programming language and a database language?

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 5:07:06 PM (reads: 1208, responses: 0)
It means that relational systems can provably cover a problem domain, whereas there is no provability for OO. Indeed there is no mathematical theory of OO whatsoever... But the truth is that relational theory is the only database theory available. It works and works well. Relational databases dominate the marketplace and will continue to do so.

I'm not one to eschew theory. I think theories and especially models of them that can be checked should be applied wherever practical; and the range of applications is getting wider all the time.

I am not buying that there is no mathematical theory of OO. First, it depends on the definition of OO, since there are many and some are more formal than others. At the very least, the essence of Smalltalk could be described without much trouble in a denotational semantics.

But I don't think that would make Smalltalk significantly easier to build systems with. The complement of theories when building software systems is tests. Yes, theories can demonstrate the absence of some general problems, but often it is *sufficient* to demonstrate the absence of specific problems empirically.

At least while we wait for the theorem provers to catch up.

SQL is a weak implementation of relational theory. Prolog is an excellent implementation.

No argument here on SQL. I'm looking forward to it's whithering away.

The large systems I have seen in Prolog tend to use cut and in general take on imperative aspects that are outside the theory per se. I'd love to read about large systems that are more true to the theory.

...That's why the relational model was developed, but much of that is unknown to developers today. Those of us who know will just have to wait until enlightenment strikes a generation of developers.

Better yet, build large systems and brag about them.

When a 100 percent relational system can for example continuously schedule and track millions of shipments world wide, or track and dispatch power equipment across the eastern US seaboard following a hurricane, which I have seen OO systems do, then *I* will be the first in line to study such a thing.

Neel Krishnaswami - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 7:09:22 PM (reads: 1201, responses: 4)
Does anyone know of any papers describing a relational algebra with transitive closures added? I've been thinking about writing a mail client that provides uses "virtual folders" defined as queries on a big bag of mail, but the relational algebra isn't powerful enough to express things like the thread structure that Message-IDs create.

Also, a tutorial on the math behind aggregate operators (like SUM and COUNT in SQL) would be much appreciated.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/4/2003; 7:19:07 PM (reads: 1200, responses: 0)
Does anyone know of any papers describing a relational algebra with transitive closures added?

The closest thing I know of would be Oracle's connect by addition to SQL, not really an algebra per se.

select lpad(' ',2*(level-1)) || to_char(child) s 
from parent_child_table
start with parent is null
connect by prior child = parent

Dominic Fox - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 8:08:31 AM (reads: 1143, responses: 0)

See here for an overview of D4, Alphora's implementation of Date & Darwen's Tutorial D. No mention as far as I can see of transitive closure. It was noted in a previous LtU discussion (of JDO) that HaskellDB supports computing transitive closures.

Frank Atanassow - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 8:53:57 AM (reads: 1132, responses: 3)
I scan through run-on paragraphs containing sentences about robbery, rape, Enron, and the GOP. Maybe you could summarize the essence?

You missed the all-important diatribe on breast size.

I think the essence is: SGML was a brilliant idea perverted by the hands of affable morons who, in the process of popularizing it, lost sight of its original purpose, and what we need to use rather than XML is LISP.

Yes, theories can demonstrate the absence of some general problems, but often it is *sufficient* to demonstrate the absence of specific problems empirically.

Sufficient for what? Turning a profit?

I think you see the glass as half-full whereas exponents of `theory' see it as half-empty. They want to raise the bar and see what comes out of it.

Incidentally, my perspective on the "there are too many formalisms" issue is: it's not that there are too many formalisms; it's that we don't know how to relate them. This is where semantics comes in. If, say, programming languages were simple and well-defined enough for us to automatically translate between them, then no one would have to complain about the fact that there are so many ways to express the same thing.

Have you ever heard anyone make an argument like this?: "There are too many ways to express `5' in this language! Look: 5, 0+5, 5+0, 1+4, 4+1, 6-1, abs(-5), trunc(5.5), ..." No, of course you haven't. But if you replace "integer" with "program" suddenly everyone wants one and only one way to express it. (BTW, the book `Proofs and Types' (see my topic post) starts off with this idea.)

The fact is that there will always be many ways to express any non-trivial idea. Think about coordinate systems for R^3, for example. The people who run the W3C think that the only way to address this situation is via standardization: make everyone use the same language, and, using network effects, reward those who follow suit and pressure those who don't.

Well, this is just a variant on normalization. But it's fundamentally in conflict with their stated goal, which is basically to decentralize information. They want to decentralize information by centralizing syntax.

Also, partly since they avoid assigning a Turing-complete semantics to any of their syntaxes, and anyways they are all so special-purpose that you wouldn't want to use them to solve more abstract problems, they've gotten into this positive feedback loop where they have to churn out four new standards to support the corner-cases---and sometimes not-so-corner cases---of every standard they produced they year before. Hence the explosion in complexity we are seeing.

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 8:56:04 AM (reads: 1146, responses: 0)
Does anyone know of any papers describing a relational algebra with transitive closures added?

I don't have a paper to recommend, but I do have a problem or constraint that would have to be handled.

This operator can only be guaranteed to work (in "normal" set theory anyway) if you can somehow guarantee that the relation is well-founded (i.e. has no cycles).

I don't think it is a mere oversight that such an operator has not found its way into standard SQL... ;-)

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 9:15:42 AM (reads: 1143, responses: 2)
Incidentally, my perspective on the "there are too many formalisms" issue is: it's not that there are too many formalisms; it's that we don't know how to relate them.

While I will agree with this as stated, I will nonetheless have to stipulate that changing this is a very difficult undertaking.

Normally a new formalism is introduced because it helps with studying a new phenonmenon or studying an old one from a new perspective. As you know, showing that two formalisms are isomorphic is generally considered a non-trivial result.

If you limit formalisms a priori by saying that they cannot be used unless they can be shown to map unambigously to an existing one, you are necessarily undermining the point of introducing a new formalism in the first place.

The real problem here is that the formalisms being introduced are not thought through even in their own terms, (as you say they have ambigous or undefined semantics) so it isn't even clear that they model their domain at all well, even without worrying about how they compare to other formalisms.

If you have to twist the domain to fit the formalism, or the formalism is itself incomplete or inconsistent, you are going to end up with a crappy analysis that is hard to use, and isn't very effective.

I suppose the one great virtue of forcing the semantics of the new formalism to reduce to a known, well-defined one would be to force the creators of the new formalism to think it through to that level of definition in the first place, though I would be just as happy if the semantic primitives were different, but well defined and consistent.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 9:27:13 AM (reads: 1133, responses: 0)
Sufficient for what? Turning a profit?

I think you see the glass as half-full whereas exponents of `theory' see it as half-empty. They want to raise the bar and see what comes out of it.

First, I am not anti-theory. That's not where I spend my time, and it's not really strong in my background, short of the typical courses in math and CS. But I am looking forward to the software industry becoming more grounded in theory-based models and less on "hacking".

Several times I have been involved in designing and standardizing concurrent, distributed systems, I used and promoted formal tools to the extent I felt able.

That said, what are less theory-based systems good for today? Well, "hacking" (in the most positive sense, not the criminal sense.)

Hacking is fun. Hacking can turn a profit. Hacking can result in good systems. Hacking in Lisp and Smalltalk has been test-driven for years. With the emphasis on agile approaches over the last few years, hacking in Java-like languages is also becoming test-driven.

Someday we'll have what I would call truly "model-driven" as opposed to the current definition which seems to mean "generate code from boxes and lines". Many of the pieces are in place and getting better. But I don't know how to build a large system this way nearly as efficiently as I can with just a little bit of theory and a simple tool like Smalltalk.

As always, I enjoy reading about improvements in this area (e.g. Peyton-Jones' financial contracts has a lot of promise), and I am sure I'm ignorant of some interesting results. But I am not in a position to be a trailblazer in this, which it seems is what's needed right now. Good examples of large systems.

Patrick Logan - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 9:32:00 AM (reads: 1125, responses: 0)
But if you replace "integer" with "program" suddenly everyone wants one and only one way to express it.

I am in favor of multiple ways to express information and computation, so long as the differences are not gratuitous, cumbersome, and expensive.

Unfortunately for IT today, the differences typically have these characteristics since they have evolved over long periods of time, have been deployed by different groups of people, and come from vendors who benefit from proprietariness.

Frank Atanassow - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 9:45:08 AM (reads: 1148, responses: 1)
While I will agree with this as stated, I will nonetheless have to stipulate that changing this is a very difficult undertaking.

If we only solved the easy problems, we would never make any progress.

If you limit formalisms a priori by saying that they cannot be used unless they can be shown to map unambigously to an existing one, you are necessarily undermining the point of introducing a new formalism in the first place.

I see. I guess you eschew compilers and interpreters, then; all they do is translate one formalism into another.

Tell me, do you also prefer pointers to be dangling? After all, if they actually point to something, why, we could just remove the indirection and everyone would be better off...

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 10:33:14 AM (reads: 1149, responses: 0)
If we only solved the easy problems, we would never make any progress.

True. But if the problem is very hard, you can't hold it against people for not having solved it, or suggest they should sit on their hands until they do.

I see. I guess you eschew compilers and interpreters, then; all they do is translate one formalism into another

Note that I did not say that there are no uses for mutually equivalent formalisms, but rather that often formalisms are introduced for other reasons before it can be known if they are equivalent to others.

Your scheme suggests that necessarily all useful semantics reduce to the same primitives for all formalisms. Sometimes this is not desirable or is incomplete.

Let's say you are programming a robot to perform some manufacturing task. The semantic primitives of the actual domain are going to be things like "move your arm at joint x by y degrees", but the primitives of your programming language are going to be numerical encodings of that action.

The normal semantics for PLs stops at that encoding, so you could have a situation where your interpretive system is entirely reducible and self-consistent but meaningless with respect to the domain semantics because your mapping from the encoding to the domain is nonsensical.

So this leaves finding a universal encoding for the domain with all the desireable mathematical properties. This is where most of this stuff runs aground, since there may not be one single sensible formalism (or even any) that models all the desired functional properties of the domain.

This is fine as it goes, but most formalisms sweep this problem under the rug, which is, I believe, the basis of our criticism of them.

If they at least made some attempt to clearly define a well-defined, self-consistent semantics, we could at least make the effort to prove that they are or are not reducible to another formalism with known properties.

I think that where we differ is that you seem to be saying that unless it is so reducible, it is a bad formalism. Whereas I say that it may have its uses in its own terms, so long as it is self-consistent and well-defined.

Neel Krishnaswami - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 10:42:38 AM (reads: 1116, responses: 1)
Mark Hamman wrote:
[Transitive closure] can only be guaranteed to work (in "normal" set theory anyway) if you can somehow guarantee that the relation is well-founded (i.e. has no cycles).

Oh, I see -- with a cycle the inductive definition of closure won't always terminate since defined sets don't always get smaller. That doesn't seem like it should be a problem for relational databases, though. We always have a finite number of elements in a relation, so we know all the infinite derivations must be cycles. This means that you can use memoization to avoid infinitely traversing them. In other words, we are moving from a definition of closure as a least fixpoint to a greatest fixpoint. This makes no difference for finite cycle-free sets, amd for finite sets with cycles we just snag the whole loop in the obvious way.

[edit] Gah, that first sentence was incoherent. It should say something like: "With a cycle, you can't inductively prove that the obvious definition of closure works, because the sets don't always get smaller at each step. This shows up in the straightforwad recursive implementation as the procedure looping on a cycle."

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/5/2003; 11:02:42 AM (reads: 1119, responses: 0)
This makes no difference for finite cycle-free sets, amd for finite sets with cycles we just snag the whole loop in the obvious way.

In the latter case you have some practical implementation problems. Lets say you have a table with several million records. Each record has a the primary key and a self-referential foreign key. Record one refers to record 2 and so on until you get the last record which refers back to record 1.

You will ultmately catch that, but its performance characteristics are not going to be nice. ;-)

Take that problem and extend it to a chain of relations that are more remotely circular. You may be waiting a LONG time for that one to detect the cycle and terminate. Just setting up the tracking to know where you have been already might be non-polynomially expensive.

I think this is why the only case that is handled in some extensions of standard SQL (such as Oracle SQL) is the strictly hierarchical case, and then only to a limited depth.

Frank Atanassow - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/6/2003; 4:07:59 AM (reads: 1036, responses: 1)
But if the problem is very hard, you can't hold it against people for not having solved it, or suggest they should sit on their hands until they do.

It isn't very hard to design the sort of programming language which keeps popping up in scripting circles so that it has a formal semantics. They're all Scheme knock-offs anyway.

It isn't very hard to design a markup language like XML so that it has a formal semantics. Parsing and language theory are well understood.

It isn't very hard to design a type system like XML Schema so that it has a formal semantics. Type theory is a well-developed field.

Your scheme suggests that necessarily all useful semantics reduce to the same primitives for all formalisms. Sometimes this is not desirable or is incomplete.

No, I only suggest that the parts of a language which genuinely have to do with computation can be factored out from the parts which have to do with I/O. Developing a good semantics for I/O may well be difficult esp. if you're dealing with some exotic domain like robotics, but developing a good semantics for the computational part should not be difficult at all unless you are doing something innovative, and, when it comes to programming languages, almost no one outside of PL research circles is.

I think that where we differ is that you seem to be saying that unless it is so reducible, it is a bad formalism. Whereas I say that it may have its uses in its own terms, so long as it is self-consistent and well-defined.

Well, usually it is neither consistent nor well-defined, but let's ignore that.

I agree that there are cases where it may be worthwhile to define your language semantics in isolation. But I don't think those cases pop up in practice unless you're doing really cutting-edge work. Even in those cases, one needs to justify oneself, by comparing one's work to related work.

If you have some fancy doo-dad which absolutely cannot be defined by translation to an existing, well-known language, then tell me why that is, or at least what is so unique and inexpressible about it. If people actually did this, they might discover their fancy doo-dad is not so fancy after all, and the problem of unnecessary diversity might not turn out so big in the first place.

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/6/2003; 4:43:35 AM (reads: 1029, responses: 0)
It isn't very hard to design a markup language like XML so that it has a formal semantics. Parsing and language theory are well understood.

I think the problem here is that they weren't really setting out to design a markup language, but rather something like a universal data definition and exchange format. (Set aside that this might be a Quixotic undertaking for a moment... ;-))

From this point of view its primary design virtues are going to be those that make it seem "easy" to encode arbitrary domains. All the kinds of changes to XML that would make it more like a PL would probably take away from these properties.

This illustrates the tension between PL semantics and domain semantics. PL semantics must necessarily be computable, but domain semantics need only to be simulated by computation. The sacrifices needed to actually make it computable are often perceived as too great.

How many people are willing to give up the notion of the Reals in computation, for example? ;-)

If people actually did this, they might discover their fancy doo-dad is not so fancy after all, and the problem of unnecessary diversity might not turn out so big in the first place.

Absolute agreement there.

Neel Krishnaswami - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/6/2003; 6:54:14 AM (reads: 1005, responses: 1)
Marc Hamman wrote:
[Regarding transitive closure] In the latter case you have some practical implementation problems. Lets say you have a table with several million records. Each record has a the primary key and a self-referential foreign key. Record one refers to record 2 and so on until you get the last record which refers back to record 1.

Doesn't SELECT DISTINCT have exactly the same issues? I mean, you need to do a sort or maintain a hash table in order to eliminate duplicates. And the memo table for a transitive closure can be a bitvector, which is only ~30K of memory per million records. I just don't see how the problems is as difficult as you suggest. (This is not to say that you're wrong; it's just that I don't see the obstacle.)

Since my mail client probably won't have more than a million records (emails) in it, I'll just go ahead and design it with a closure operator. If it gets to be a problem then I can remove it/deprecate it.

Marc Hamann - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/6/2003; 3:17:48 PM (reads: 949, responses: 0)
Doesn't SELECT DISTINCT have exactly the same issues?

In that case, you need only consider "Is value X in the index/table?". In the transitive closure case, you have to consider arbitrary chains of values (does X lead to Y, which leads to Z... which ultimately leads me to A, which is under consideration)

Add to that the fact that we are talking about relations (there can be more than one branching chain from a value) and you quickly get a combinatoric explosion of chains that you would have to consider for a large table or set of tables.

In your particular project, the worst case scenario may not present itself, so it might work with acceptable performance. An RDBMS, on the other hand, must handle arbitrary cases, so it would run into these problems.

Paul Snively - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/6/2003; 7:58:15 PM (reads: 922, responses: 0)
xeo at thermopylae: It means that relational systems can provably cover a problem domain, whereas there is no provability for OO. Indeed there is no mathematical theory of OO whatsoever.

Gee, I'll be sure to let the folks behind <http://www.kindsoftware.com/products/opensource/OBJ3>, <http://sdg.lcs.mit.edu/alloy>, and <http://caml.inria.fr> know that they don't exist.

andrew cooke - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/17/2003; 1:14:56 PM (reads: 692, responses: 2)
do you know that there's a standard trick to map trees into sql by partitioning sets? i don't know if he thought it up, but celko has described it (google).

this representation allows you to get all the nodes in a subtree using "select ... where ... in ..." - it might help in your case.

[this in response to the email client db structure]

Chris Rathman - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/17/2003; 1:48:53 PM (reads: 671, responses: 0)
Been using Celko's solution successfully for a while - highly recommended for dealing with trees in SQL. (Had one routine that was taking 45 seconds to get a large subtree using cursors that I managed to cut down to fractions of a second). I've been wondering whether this is the same sort of behind the scenes logic that Oracle uses in their CONNECT BY syntax?

Ehud Lamm - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/17/2003; 2:31:22 PM (reads: 700, responses: 1)
Should be in Celko's book.

andrew cooke - Re: Tuples + Objects + Infosets =Too Much Stuff!  blueArrow
10/18/2003; 6:47:51 AM (reads: 694, responses: 0)
i bought another of his books - data and databases - and wasn't very impressed. it didn't seem to have anything very useful in. it may be that i simply don't know enough about sql, i guess.

incidentally, i'm not sure it's clear from my previous post, but that way of representing trees gives you transative closure (if i understand the term correctly). i don't see how it could be made to work for more general graphs, though.