Why prolog is by far the best, most productive, easiest programming language


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


You can't really avoid 'data structures' in Prolog or logic languages. They're necessary for describing complex domain elements, such as 'plans' or 'paths' or 'geometries' and whatnot that might be used whole in a relation. Without structure, you need to come up with some surrogate ID for each such element - which can be made to work for input, but doesn't handle so well for output.

There is much to be inspired about in Prolog (or its cleaner variant, Mercury). But I've never liked how Prolog integrates with side-effects or communications, and I also dislike Prolog's lack of a termination guarantee.

You state in a comment that you somewhat intend Prolog for use in meta-programming, and I can see how it would do some useful work there. You might also be interested in learning Term Rewrite Systems, such as Maude or OBJ, which generalize upon logic programming systems (allowing the developer to essentially tweak the logic, too).

I think people are making a

I think people are making a mistake when they use prolog for anything other than creating an ontology-like thing
-if you're using it explore a large search space it's the wrong tool (gradient descent/genetic algorithms/constraint programming are better tools)
-if you're trying to think in terms of loops/recursions operating over data structures because you have a certain fast algorithm in mind it's the wrong tool (use functional or procedural programming)

But, what i'm saying is, even the above should be expressed and reasoned over in prolog (metaprogramming) as abstract entities. In other words create (or add to your existing) 'expert system' every time you want to solve a problem. Side effects or Mercury with its static typing don't make sense in the context of that. You're not exploring search spaces and so termination doesn't come into it.
Programming = metaprogramming = organizing information = database.

The competition for prolog is all the model driven architecture/code generation/uml, ide's (like eclipse), ontology editing tools (like protege),frontends to databases,and scripting langauges, and i think prolog easily beats them.

RE: 'making a mistake'

I'll grant that other utilities will likely be superior when dealing with large search spaces. I'll also grant that logic programming is not ideal for algorithmic reasoning - and certainly not for side-effects! But these caveats you're including do much to undermine your bold claim that "prolog is by far the best, most productive, easiest programming language". To a casual observer, your argument reduces to: "prolog is best wherever prolog is best".

I'm not especially familiar with MDA or UML, but - for example - you name 'code generation' as a place where prolog is superior. By your own above caveats, you're wrong: code generation has a wide search space and a large variety of fitness functions (space and time performance, energy consumption, bandwidth expenses, latency, predictivity for real-time or embedded systems, portability, etc.). As you suggest above, some combination of generic algorithms, constraint programming, gradient descent, etc. may be superior for this task.

Anyhow, termination does 'come into it' - even without exploring search spaces. Even when just defining an ontology and applying it to reach a conclusion it isn't difficult in Prolog to muck up the order in which definitions or entailment rules are listed and end up with a non-terminating computation.

Static typing - the ability to determine when a statement or definition doesn't 'make sense' - is also of considerable utility.

Three points: There was a

Three points:

  • There was a whole logic-meta programming movement a few years back at from a research group in Belgium that specialized in Smalltalk. But it hasn't seem to gone anywhere, and the meta-programs look a bit convoluted at best.
  • I don't see Prolog has being very useful, but other domain-specific logic programming languages have been a bit more successful because of their limited scope (e.g., datalog).
  • For me, the advantages of logic programming are obvious yet frustratingly elusive. The problem is...logic that we can encode is too inflexible and stupid, whereas humans can reasoning is the exact opposite. Until we can encode a flexible logic, I don't see much hope.

There was a whole logic-meta

There was a whole logic-meta programming movement a few years back at from a research group in Belgium that specialized in Smalltalk. But it hasn't seem to gone anywhere, and the meta-programs look a bit convoluted at best.

Is that the right link? It is for British Columbia's research group. Did Belgium takeover Canada?

I saw the Logic AOP project, but the only paper on that sub-part of the site was printed upside down...

The only other thing I saw was Apostle, which integrates Aspect weaving into a dynamic language (Smalltalk) via incremental weaving. Unfortunately, Apostle had this funny/sad comment:

Although stable, Apostle is not suitable for real system development, but it can be played with to investigate the relevance of AOP programming in the Smalltalk world. Sadly, the source code was lost due to a laptop theft; there is an image for VisualAge for Smalltalk 5.5 image with bytecodes, available by request.

A grad student learns the importance of distributed version control and non-local backups! Doubly awesome is that person is now, six years later, interested in researching distributed VCS's from an HCI standpoint.


For me, the advantages of logic programming are obvious yet frustratingly elusive. The problem is...logic that we can encode is too inflexible and stupid, whereas humans can reasoning is the exact opposite. Until we can encode a flexible logic, I don't see much hope.

And what about using term rewriting for encoding logic? It is capable of expressing not only metaprogramming but metalogic...

printed upside down...

That's hilarious! Thanks for pointing that out, Z-bo.

[edit]: Fortunately, if you actually print it, you can always turn the papers around. Adobe seems able to do the same.

Printing to see reflection

Maybe I'll just use a mirror :)


We know how much you like reflection, but I suspect that reading in a mirror will only add to the difficulties. ^_^

RE: TRS for Logic

what about using term rewriting for encoding logic? It is capable of expressing not only metaprogramming but metalogic...

I don't believe this will actually resolve the issues Sean faces. TRS certainly allows us to describe more complex logics and expressions, but Sean wants the ability to represent the sort of 'flexible' logics seen in humans. Humans aren't all that logical: we make illogical leaps, use analogy, view systems in many different models in parallel, and learn for better or worse by reinforcement; we imagine and predict. We also deal with deception and suspicion. Even the most rational of humans works in terms of 'justifiable' conclusions rather than the 'logical' or 'correct'.

To represent the 'flexible logic' processes seen in humans takes an inherently illogical system, but one that can integrate multiple models, multiple resources, multiple agents, multiple perspectives for any given problem. Chances are, these systems will be maintained and upgraded by different groups, will need to be plugin extensible, etc. Blackboard systems are a useful basis for such integration, but one among many. I once saw a presentation of some work by Applied Systems Intelligence (in Georgia) that described how they needed at least seven different knowledge and logic models to encode the many ways of 'thinking' and 'understanding' necessary to reach useful conclusions within time and embedded space limits... and every one of those models needed to integrate with databases.

TRS has a few sticky challenges when it comes to such integration of many different models and external resources. TRS expressions carry their whole model around with them. I suspect it could be made to work, but I'm still stuck on how to do it efficiently and effectively and in a manner users can easily grok.

TRS expressions carry their

TRS expressions carry their whole model around with them.

It is true that I'm unaware of what is being done here to solve this problem, but I think I need to tackle it if I want to use a Maude like language to make a case for model-driven graphics as a route towards ending the GPU API Wars. I was actually reading papers about this stuff today in the context of massively parallel rendering.

My opinion simply stated is that TRS should not be used to create Controller nodes. Haskell has this same sort of problem, but it is less evident to the untrained eye: lazy induction makes it seem like term rewriting doesn't have the Algorithms = Control + Data requirement; the compiler is still executing some sort of fancy graph algorithm. The developer still has to provide a flow of control, and that flow of control is the basis of lazy evaluation. Thus, such laziness does not transparently solve problems such as action refinement, mobility, disruption tolerance and system stabilization. -- You still have to model these problems directly so that your translation engine can 'map out' the agents correctly. Today there is way too much negotiation between the compiler and developer, and the developer always bows to the compiler's decrees. I don't have the right terminology for describing this yet, however. It is likely somebody has already thought of these issues, but my research has been an extremely wide journey across many areas of computer science, including some interesting critiques of the von Neumann model.


Is that the right link? It is for British Columbia's research group. Did Belgium takeover Canada?

I believe a member of the Belgian group had moved to Canada: see the project page.

A grad student learns the importance of distributed version control and non-local backups!

Obviously, he did have a non-local backup, since he still had the Smalltalk image.

And VisualAge Smalltalk, being a self-contained system that dated from late 80s/early 90s, used its own (centralized) version control system, ENVY.

BTW, the survey of DVCs you mention dates from 2009, seven years after the work on Apostle, and three years before "now".

Program layer and Service layer Logic Programming

Your comments are consonant with my understanding.

Datalog hits a sweet spot because it doesn't stretch itself: Datalog is complicated enough to include simple domain models yet represents a relatively simple logic: it lacks negation and defaults (or priority) and any number of other complicating factors), and always terminates up to input, and always produces a single answer set. It works well with laziness - can handle both backwards and forwards chaining to fulfill the queries demanded of it. It fits very nicely into a functional programming language to operate over first-class sets or relations.

I include a variation of Datalog in my programming language at the functional strata - it's almost perfect for FRP combination of database resources (essentially, each Datalog-like expression produces a database, and may observe multiple databases). In the past, I've tried adding to my Datalog variant such concepts as negation and defaults, but in the presence of recursive implication these can produce multiple answer-sets (based on order of default, order of negation). One can reason over multiple answer sets (confer Answer Set Programming) but that's not something I'd voluntarily inflict on myself or other users. Explicit negation also allows datalog expressions to become 'inconsistent', which is non-trivial to check for in the case of distributed expression (as common in FRP). (I'm still contemplating some support for defaults with a restriction against recursion for the defaulted element.)

Anyhow, while datalog hits a sweet spot, it hardly approaches the greater depths of logic programming and the sorts of 'flexible logic' we might favor on the large scale. It has become my opinion that the major advantages of logic programming are realized in conjunction with large databases of domain knowledge and ontology. That is, both the interpretation of a logic query and the answer to a logic query may vary over time based on manipulations and refinement of both the ontology and the data resources. Maintaining these massive ontologies and data resources is non-trivial, inherently distributed, and subject to all sorts of security and market constraints. Therefore, advanced 'flexible logic' programming is infeasible without effective support for secure, distributed programming. We need build services that answer DSL-style queries via reference to amalgamations of services: databases and data distribution services, expert systems and blackboards to integrate them, search, complex event processing, etc.

Even something as simple as code production - compilation of code into another format - can leverage all the aforementioned resources and systems. Given enough knowledge of hardware resources, for example, and even of the other programs already running on the hardware, one could appropriately compile a program to take advantage of precise cache sizes, FPGAs, GPUs, properly select the right allocation and paging sizes based on HDD vs. Flash for persistent memory, etc. etc. etc. This takes a lot of data, and a lot of support from well-designed intelligent systems (which are unlikely to be hosted or maintained locally). Ah! as an aside, I was able to locate an interesting website that I haven't viewed for a few years: Declarative Meta Programming is an inspiring source of information on the subject of code generation using declarative techniques.

To achieve flexible logic programming, we must start with a programming system that offers ability to effectively compose and mash-up distributed systems. Prolog is too low-level to realize the potential of logic programming. Datalog is also far too low-level for said task, but its limited scope allows it to at least serve as a useful primitive in the development of greater systems in the same sense that simple functions or reactive objects might serve.

My own research into FRP started with an interest in how to securely and effectively integrate such high-level services. In addition to security, I wanted to make four guarantees on this integration: (1) anything you can query without side-effects, you may also subscribe, so as to minimize polling; (2) fully demand-driven from end-to-end (even across intermediate CEP systems, iterative constraint solvers, and registries*), such that sensor networks don't waste effort providing data that isn't necessary to a side-effect somewhere on the far side of the network; (3) disruption tolerance, such that if a resource goes down one can easily fallback to alternate resources or use a cached version, and such that (for SEP) if you miss a few events over brief disruption a short history can catch you up; (4) scalable multi-cast distribution of data (Akamai style). Beyond those four guarantees, distribution of the processing element to the data resource is also desirable (for performance, resilience, and disruption tolerance). Also, flexible composition of expert systems, databases, and other high-layer resources generally requires updates be separated entirely from queries. Barring a few exceptions involving fully 'reversible programming' (which is an admittedly very interesting subset), one cannot compose arbitrary external data resources if one must support 'update' semantics! FRP supports all these properties, even including decision on how to most efficiently distribute processing elements across a cloud of resources. (*: I support demand-driven properties across non-FRP systems is supported by reflecting 'demand' itself as an FRP capability emitted by every new pipe or cell. This allows proverbial forests to 'pause' whenever nobody is around to hear the trees fall.)

So can we achieve 'flexible logic' programming? I believe so. But I believe attempting to develop a "logic programming language" at the primitive layer is a mostly wasted effort. Our 'flexible logic' programming will look more like a service connection to Cyc or Wolfram Alpha, albeit with far more fine-grained distribution and many more providers due to issues of security and market. Expert systems will eventually mash up as easily as websites do today, or perhaps even more so.

I think with the above

I think with the above you've described Tim Berner's semantic web vision.

Semantic Web

Any attempt to make progress on what we have today will look similar, in some respects, to Semantic Web. You should not just lump them together, though, because the differences can be significant.

I do not believe that Lee's Semantic Web - neither the technologies nor the vision - will 'succeed' (in the sense of becoming easy to use and performing well). Lee describes the SW as an extension of the current Web (and, effectively, of the current process, OS, and code-distribution model). What I describe is closer to an out and out replacement of both the current Web and OS design (even dropping DNS and Filesystems), albeit allowing adaptor components between the systems to support incremental migration. Lee describes SW as allowing widespread processing of data that is essentially held within a common 'cloud' (the 'Web') - and he basically ignores security and market issues - whereas I favor designs that carefully control which resources each element may access: no common names, but access controlled through secure capabilities and smart contracts.

You don't become a language or OS designer without accepting, at some level, that small changes in the basic elements of a design can have a significant effect upon the emergent system - its feasibility, performance, safety, security, usability, productivity, maintainability, marketability, mechanisms for composition, and so on. We don't always know what that effect will be, of course: our models are limited, especially on the human side of the equation. But many designers pursue subtly but significantly different visions and accordingly make subtle but significant tweaks in the design.

The declarative

The declarative metaprogramming site you found seems to be like to what i'm describing, they talk about using prolog to generate programs in other languages

'Pure' Prolog?

'Pure' Prolog?

DMBarbour's programming language

You mention 'my programming language' that includes a variation of Datalog. Is that a public project?

That programming language

That programming language has gone through about six major revisions now, to hammer at design bugs that hindered distribution, failure handling, scalability, or security. A datalog-like subset for working with sets has been with it through three of those versions, and is even more important in the most recent one (where use of sets is prevalent).

Only two of those revisions made it all the way to a workable prototype. The rest have failed on paper. If you're looking for something workable to play with, I have nothing to offer.

I very recently rejected actors model as a basis for my language due to the difficulty in integrating continuous data, live programming, and issues with disruption and network partitioning, and survivable networking via mobile agents. Actors also require a lot of state (message queues, explicit caching, hooking frameworks for callbacks, and to decouple communications from time), which scales poorly. I had switched to transactions for concurrency control to avoid certain denial-of-service attacks and improve scalability (beyond what serialization offers), but transactions were a real pain to integrate with hardware and third-party services.

But if it wasn't for the work on the runtime architecture aiming to overcome some of these flaws, I'd not have found a superior replacement.

My new design is based on Reactive Demand Programming, which uses continuous, concurrent, idempotent, reactive demands rather than messages or events. Agents are always active. Overall, the paradigm simplifies a great number of my user-stories, and pretty much blows actors out of the water. But it is new (the architectural simplification that seeded it happened back in early April 2010).

This time I'll be using Haskell. I haven't written a line of code yet, other than to scrub rust off my Haskell skills. I'm reading a bunch of Oleg's articles. I do have a basic RDP vocabulary that remains secure and supports major runtime optimizations. I've also finished the refurbishing of the older module system for RDP (secure replacement for FFI; runtime is plugin extensible and authorities to each service can be defined. To simplify debugging, it is also hot-pluggable and hot-swappable).

It seems Haskell's libs are missing a few critical bits of what I need. For example, I haven't found a safe model for sealers/unsealers (which I really want to represent as pure functions after construction).

The project is public, by the name Awelon. If you're interested in helping out, or simply hearing more about the architecture, I have a nice long e-mail I wrote for someone else that I can share, and a long list of relatively orthogonal tasks to perform.

In any case, the datalog-like subset is even more critical in the new RDP design. Sets are provided output from the 'demand monitor' primitive and thus are prevalent in the system, and datalog is what I choose to query sets - without imposing any particular order upon them. I'm wondering if I can do this right in Haskell: I'd like lazy evaluation of datalog-like queries, which might cross multiple sets, while supporting both forwards and backwards chaining and a reasonable level of data-parallelism.

You may get people to help

You may get people to help you implement it if you explained more clearly what the language is going to be like. A tutorial that is written for people who would use the language is a good way to explain it (it should be >50% example programs; just words doesn't convey what you have in your head well enough to get an idea of how an implementation might work). I found the language description on the Reactive Demand Programming page vague.

Please do share the email!

Reactive Demand Programming

RDP isn't a language; it's a programming style, or architectural style ;-).

That notwithstanding, I agree that more examples would help. Unfortunately, my Awelon syntactically appears much like any impure functional language; the difference is with regards to pluggable program composition, concurrency, continuity, reactivity, and idempotence of effect. None of that will be 'visible' in any small example; instead, it is implicit to the environment. That means what I really need are pictures (which I lack the tools to create) and compare-and-contrast code that shows in other paradigms the massive 'framework' that RDP avoids (which I lack the discipline to create).

Maybe, before finding people to help code my project, I should find people to help explain it for me, or translate my thoughts into a concise summary that other people will understand. Really, the idea of RDP is so simple that I suspect it was invented back in 1970, then forgotten. :-)

My realistic expectation is that I won't receive help until I have a working prototype, an installation package, and some 'visible' demonstrations. Perhaps hot-swapping between GLUT and GTK windowing would suffice, though that would be one of the lesser features of the Awelon runtime design. Or maybe I'll need some Silverlight-style projects, or interaction between multiple machines.

Anyhow, one doesn't need to grok RDP to help out with related projects. The Awelon runtime involves plenty of work that would be useful for any distributed programming language with high stability, security, and resilience requirements.

I'll get you the e-mail later.

You did seem to have a

You did seem to have a language in mind about 3/7ths down the page, but the exact semantics of the constructs were not clear to me.

Even if the advantages of the language features can't be shown in small examples, the semantics probably can.

Nowadays you can get a prototype of a language with full blown libraries up and running very quickly using the DLR on .NET. This saves you the enormous hassle of implementing file I/O, GUI, networking libraries, garbage collection, etc.

It would be great if you could describe a simple subset of a language that has the essential ideas in RDP, like lambda calculus + state is to Scheme.

Assume agent F is confined

Assume agent F is confined and stateless. There might be three independent demands on agent F for F(X), F(Y), F(Z). If so, the actual return values are F(X,{X,Y,Z}), F(Y,{X,Y,Z}), F(Z,{X,Y,Z}). If the agent requesting Y later leaves, then the observations change to F(X,{X,Z}) and F(Z,{X,Z}). Thus, the observed behavior is not just a function of parameter you provided. Instead, it is a function of both the input and of all current observers.

If F had references to other agents, then the function grows more complex - hidden demands, hidden models. F may also be stateful, keeping memory of prior demands and its own observations. Things get especially interesting if X, Y, and Z might contain references to agents. In the large, F can't really be considered a 'function'; rather, one must understand F as an agent with a purpose, fulfilling a role.

The advantage of RDP lies in the fact that F only rarely needs to be stateful since a great many uses of state are replaced by either these shared observer-effects (which can be used for database, registry, resource and authority acquisition) or the more general features of agent-oriented programming (which supplies implicit caching and integration into update framework). RDP also avoids the fragility of message-passing in open systems, where queuing without loss and without duplication (especially during network partitioning or temporary disruption) is non-trivial.

Once you start using these 'observer-effects' to drive the system (to power on cameras, to raise wrench effort), 'observer-effect' seems an inadequate name. The system is demand-driven. Thus, I call them 'demand-effects'.

All agents are making simultaneous demands on the system. There is no implementation language I know of for which this is just a small semantic tweak. So I'm using Haskell. :-)

Re: libraries

I can't think of many libraries that would work effectively with RDP's 'concurrent, continuous demand' concept without a significant adaption layer. Exceptions would include scene-graph systems and other data-driven programs.

[edit:] I suppose I could adapt the imperative libraries by use of factory pattern to create bunches of 'transient' agents to represent individual method-calls and workflows to orchestrate them. RDP is certainly the most simple and expressive basis for 'orchestration' or 'workflow' patterns I've ever seen (cf. Orc and YAWL). But that is NOT the style of programming I aim to promote; I do not consider such to be robust or scalable. I'd rather keep such patterns to a bare minimum of in-the-small programming where nothing else will suffice, and instead adapt more edge-systems to be data-driven or demand-driven.

Anyhow, I'm basically reinventing the code-distribution architecture, especially including integration with foreign function interfaces.

Even if I have GUI libraries, I don't wish to use them directly because that would be an expression of ambient authority; instead, I represent various GUI services as agents preexisting the introduction of my own (creating authorities via configuration), and implement them via runtime-pluggable modules. I may also include authorities to remote systems in my configurations, allowing me to 'program' remote systems locally. Overall, principles of security, live-programming, distributed programming, and persistence all tangle together - none of them truly practical without the other three.

Code distribution occurs when I allow agents or code-fragments to mobilize and live near the resources they use, or near the systems that make demands upon them. It doesn't need to be automated (i.e. I could 'demand' that a remote agent to which I personally have authority execute a particular script to spawn a new remote agent), but automating this allows developers to focus on the goals and constraints around distribution, and can be more precise about which code-fragments are transported - resulting in more distribution, and less duplication of code, than a human is likely to achieve.

Is this how an interface to

Is this how an interface to a database would work:

    type Command = Get ID | Insert ID Data

    F(Insert 3 "Foo")
    F(Get 3) -> "Foo"

If this is how it works then (1) How do you handle multiple inserts (or updates)? How do you determine which one gets precedence over the other? (2) How do you plan to implement this efficiently? A database (and presumably other actors as well) needs to store the data in its own data structures to be efficient, so you don't have to linearly search over the list of commands that are currently active.

Languages as Databases

How do you plan to implement this efficiently? A database needs to store the data in its own data structures to be efficient, so you don't have to linearly search over the list of commands that are currently active.

I grant more database-like qualities to the language runtime. If a language is already intended to be persistent and distributed, why not support first-class sets and first-class relational queries? Awelon does this, providing a simple datalog variant for complex relational operations.

MyGraph={e(1 2) e(2 3) e(2 4) e(4 5)}
TClose G = 
  query Q: e(A B) <= G.e(A B)
         | e(A C) <= Q.e(A B),G.e(B C) 
PathExists G A B = MemberOf(e(A B),TClose(G))
assert PathExists MyGraph 1 5
assert Not(PathExists MyGraph 4 3)

A set can represent a whole relational database, using a tag per relation. :-) ([edit:] This 'whole set' approach is more flexibly typed and syntactically convenient than would be using a record of relations: one never needs to express empty relations.)

Ideally, a high-quality implementation of Awelon would recognize large sets, index them, support lazy queries, query optimizations, forwards and backwards chaining. A lower-quality implementation of Awelon might still index sets after they reach a certain size: I am thinking that some combination of the 'rope' and 'kd-tree' concepts would be suitable for a naive set-indexing technique - just index everything, no basis profiles or developer annotation.

But it will be a while before the runtime is at high level with regards to professional database-scale set storage and processing performance. In the meantime, you can introduce an agent that represents an external storage resource - such as an SQL database. In Awelon, this is written into a project-layer configuration, along with the necessary capabilities, and requires a proper module be available to the runtime that will provide the service. (If you need more than one such object, without knowing how many in advance, you might instead use RDP's variation of the factory pattern.)

The performance problems are, in part, simply avoided: most agents never directly observe their demands as a set. Instead, they filter, organize, and delegate their demands. This tends to result in smaller, specialized sets of demands, and also allows a great deal more replication and code-distribution to lift the burden off of any one processor. In Awelon, agents must explicitly spawn primitive (demand, monitor) agent pairs to locally observe current demand.

I'm still lacking a satisfactory approach to set aggregator functions... in particular, ensuring determinism independently of order of evaluation and parallelism. I really hate the idea of defining some set of aggregation primitives (sum, min, max, etc.), so what I'd really like is to statically prove that an aggregation function has the necessary commutativity, associativity, and identity properties.

How do you handle multiple inserts (or updates)? How do you determine which one gets precedence over the other?

Multiple inserts are handled by multiple demands on the agent representing the database. All demands are simultaneous. So three agents inserting several items might look like:

Agent Q: F(Insert 3 "Foo") F(Insert 4 "Bar") return \X -> F(X) Agent R: Q(Insert 5 "Baz") if("Baz" = F(Get 5)) return \X -> Q(Get X) else become (...Agent Body Here...) Agent S: F(Insert 3 "Bwe") Q(Insert 5 R(4)) return \_->()

All three agents are continuously and concurrently active. All expressions are continuously active, excepting only those hidden behind a conditional. If we were to round up all the initial demands on F, it would look like:

 F(Insert 3 "Foo") -- from Q
 F(Insert 4 "Bar") -- from Q
 F(Insert 5 "Baz") -- from Q from R
 F(Insert 3 "Bwe") -- from S
 F(Insert 5 "Bar") -- from Q from S
   -- "Bar" in last is from R from Q from F
 F(Get 5) -- from R
 F(Get 4) -- from Q from R from S

RDP dumps all of these concurrent demands onto F, which it may observe as a set (using primitive 'demand monitor' agent pairs). It is up to F how to interpret and handle the set of demands. F gets to decide whether the mutual inserts on IDs 3 and 5 represent some sort of inconsistency. F decides how to handle the inconsistency. In the case of ID 5 - "Bar" vs. "Baz" - the decision is relevant because it will affect the future state of Agent R.

In practice, the developer of F would be more likely to decide this isn't a consistency problem, and reflect this in its type by returning a set: F(Get 5) -> {"Bar", "Baz"). OTOH, if choosing a single value is truly necessary, developers would structure their programs accordingly. There are many strategies for resolving inconsistencies:

  1. Arbitrary decision, i.e. based on lexicographic ordering. Simple, deterministic, works well in practice.
  2. Restructure to avoid ambiguous precedence. Resolve inconsistencies and set priorities at intermediate agents, following secure capability-model composition patterns. Allow responses such as: "your inputs are consistent; get them right before I'll pass them on!". Not so different from form validation.
  3. Consult a third-party. F has access to other agents representing expert systems, operators, users, administrators, decision-makers. F can consult with the ambiguity; this can result in logs, dialogs, SMS messages, e-mail, and so on. A response can be used to resolve the inconsistency.
  4. Use memory. Agent F could use 'become' to transition to new states based on observed conditions. This allows first-demand-wins and last-demand-wins policies (and corresponding race-conditions). However, won't work if inconsistency all observed at once, so you need another resolution mechanism (arbitrary choice, or workflow).
  5. Develop workflow patterns. Fine-grained agents may represent queues, semaphores, mutual exclusion, transaction managers, and so on. While concurrency is default, simple conditional expressions - and occasional 'become' transitions - will suffice for any pattern to synchronize or sequence demands. But be careful: voluntary patterns such are not especially robust to adversarial or buggy agents.

"The problem is...logic that

The problem is...logic that we can encode is too inflexible and stupid, whereas humans can reasoning is the exact opposite. Until we can encode a flexible logic, I don't see much hope

What about BinProlog implementation which supports hypothesis construction ? (not only deduction as in usual prolog systems - but abduction also implemented).

I see here shifting towards human reasoning more closely. Do you ?


I found your blog post and this ensuing reply to be very cryptic. Do you intend to follow-up this blog post with examples, or is this blog post argument the end-of-the-line?

Here is some advice.

Use 'detail sentences' more. Your points are frustrating to read because they communicate no stream of thought, but rather seem to indicate feelings and emotions. You're presenting conclusions, not a thesis or even body of discussion.

Motivate your opinion with examples in Prolog and compare to what the system would look like in a paradigm other than one limited to Horn logic. e.g. How do you do code generation in Prolog? Does backtracking or a cut operator help with the task of code generation? What is it that you are code generating? How do you verify that you are generating the right code?

Explain to me why more general logic programming (Maude) or less general logic programming (Datalog) is inferior, and why Prolog captures some mysterious sweet spot.

Perhaps explain Concurrent Prolog vs. Prolog, and motivate with a discussion of object capability systems.

Perhap explain Prolog vs. Mercurcy, and the convenience of a typing discipline that makes it easier for developers to work on larger projects.

What parts of Eclipse compete with Prolog? How do you expect the average Prolog programmer to compete with the optimized analysis-to-design translation services provided by Eclipse modeling framework's code generators? Is there a similar library for Prolog tuned by expert Prolog programmers who know a lot about spitting out efficient code?

What parts of UML compete with Prolog? Explain how you can restrict a 'profile' of modeling elements in Prolog. How do you do model checking and executable specifications? Model refactoring?

For MDA, please compare with the leading MDA vendors, such as vendors that provide translation simulators. Here is a list of MDA vendors for you to compare basic Prolog to: Bridgepoint (Mentor Graphics), PathMATE (Pathfinder Solutions), xUML (Kennedy-Carter), and Rose/RT (IBM). I would particularly recommend a comparison to Bridgepoint, as its IDE provides a full VM for model translation. How do you simulate translation from a Platform Independent Model to a Platform Specific Model? Tucked away in your post is the implicit position that Prolog is therefore suitable for real-time/engineering. Please relate your experiences in Prolog to how you've used it in real-time systems.

What ontology editing tools compete with Prolog?

What frontends to databases compete with Prolog? How does Prolog compare to Datalog? How does Prolog compare to SQL? How does Prolog compare to other query languages? Look at the Business System 12 retrospective by Hugh Darwen for a rough sketch of how one leading database researcher summarizes how one language he designed compared to a bunch of others.

For scripting langauges, please compare Prolog to bash, Perl, PowerShell and Tcl.

What about performance, such as the fact modern Prolog implementations implement multiple constraint solvers so that the end-user must be an expert in knowing which constraint solving system is suitable to their task? How does such dependencies affect the portability of Prolog?

Those are valid points, I'm

Those are valid points, I'm making an emotional argument and so on. I'm not a programming language researcher or expert in any of those code generators/mda tools.

However, I'm coming from the point of view of a mere mortal computer user who wants to get the computer to do stuff. From this viewpoint the tools I have seen that are easieast to use are spreadsheets,relational databases, the ontology editor protege, and scripting languages perl/python/ruby/tcl. Among these, relational databases appear to me the most elegant, useful, robust and simple. Hence my choice of prolog which I think has a cleaner syntax (I'm not exactly sure what datalog is but it might be actually the language among i'm advocating but there aren't free implementations with documentation available like swi prolog as far as i can tell).

Put another way, if you consider that I as a domain expert in some activity, such as medical diagnosis or legal reasoning, want my computer to help me (without the expense of going through a programmer), I think the best choice would be a database/expert system of some kind (here is where prolog/datalog wins).

Now, what is the difference between a domain expert who is tasked with developing a medical diagnosis system (a doctor), or one who is tasked with developing a computer program (a programmer). I would argue not much. Both have large/complicated set of heuristics they would like to tell the computer (and so they should both using that same tool - relationdb/prolog).

you have valid points as well

i think there is a lot to be learned from the 80% cases. the problem is that when you get past those, things can quickly go only 1 of 2 routes: huge horrible hacks upon hacks, or unraveling it all and starting over the right way.


You might find Flora-2 interesting.

I looked at flora-2 and I

I looked at flora-2 and I think it kind of veers into a different direction. What I like about prolog/relationaldb's is the fact that everything is a table and when i go up to do something, I only have to think about joining/unifying across these flat things.

The crumbs under the table

Good points, and I did not mean to deny you the proverbial seat at the table. We're all seated at the table. I am just inviting you to have me as a further listener, by requesting real world examples of how you use Prolog.

Now, what is the difference between a domain expert who is tasked with developing a medical diagnosis system (a doctor), or one who is tasked with developing a computer program (a programmer).

The difference, since the 1990s (and perhaps earlier), has been that there are no longer "Programmer/Analyst" job positions. This job title is effectively extinct today, because there is much greater emphasis on having both a domain expert and a programmer. We might even separate this further by inserting in a "product manager": somebody who can talk to each in their own native language. For a medical diagnosis system, I've never heard of a doctor also being the programmer... Instead, the domain experts are used to validate the accuracy of the expert system (to ensure things like blood disease diagnosis is not incorrect, and that the patient does not receive the wrong medication).

Put another way, if you consider that I as a domain expert in some activity, such as medical diagnosis or legal reasoning, want my computer to help me (without the expense of going through a programmer), I think the best choice would be a database/expert system of some kind (here is where prolog/datalog wins).

I'm not certain judicial reasoning and proof lends itself well to Prolog. I am only aware of two full-length books on the subject, neither of which I have read: Judicial Applications of Artificial Intelligence and The Dynamics of Judicial Proof: Computation, Logic, and Common Sense. It seems like except for DOCU-PLANNER, custom systems were built to support legal reasoning. I'm also aware of Peter Tillers' website dedicated to MarshallPlan which is, as far as I know, the first software program to allow users to do heuristic analysis of very large amounts of evidence. I mainly was interested in this because J Storrs Hall mentioned the application of artificial intelligence to judicial reasoning in his book Beyond AI, but did not provide any references. My curiosity stopped when I confirmed it was true, and so don't expect me to be especially knowledgeable here :) I am just attempting to understand your point.

dmbarbour mentioned above

dmbarbour mentioned above the site declarative metaprogamming http://soft.vub.ac.be/DMP/ and they happened to choose prolog for doing their metaprogramming so there must something to it.

"The difference, since the 1990s (and perhaps earlier), has been that there are no longer "Programmer/Analyst" job positions. This job title is effectively extinct today, because there is much greater emphasis on having both a domain expert and a programmer."
If that is indeed true, then I regard that as more of a failing of the software tool industry/academia than a success (perhaps they're not emphasizing enough how good and easy to use relationaldb/prolog is!). Why hasn't programming become another basic literacy just reading,writing,arithmetic (see Alan Kay) that professionals use as part of their everyday work.

Modern Domain Model Analysis

Modern analysts with computer skills use modeling software and tend to focus on principles of stable design, such as: (1) set-based classification of real world entities into things called "Classes" (2) model-driven engineering, where the compiler does the bulk of the "programmer" work. Relational theory is as important as ever, since good model checking software depends on analysts who understand third normal form.

Some punditry: I think it is important to realize in programming language theory that much of the important math will eventually be resolved (just look at the advancements at the POPL and PLDI conferences each year), and the main issues that remain will be portability of models and ensuring that sharing a model (and therefore porting it to the next person's computer specifications) maintains or improves its nonfunctional characteristics. At the same time, we're seeing fewer and fewer interesting papers from conferences like OOPSLA, and now that OOPSLA is re-branding itself as SPLASH, the writing is on the wall for the classical notion of "programmer". (The Visual Studio team already hires C.S. M.Sc. graduates to be product testers, foreshadowing the day when IDE's will be built solely by team's of Ph.D.'s) What's especially telling is that, a few years ago at OOPSLA, there was a birds-of-a-feather discussion to talk about how to handle the multicore kerfluffle, led by Brian Goetz. According to Goetz's report of that BoaF, nobody had any clue how to help the average programmer write parallel programs. Yet hardware folks have been working on personal supercomputing since the early '60s (led by Gerald Estrin), and there was even a 7 year research project into parallelism in the '80s about how to apply massively parallel techniques to systems theory (including exotic prototype concepts such as neural network-driven compiler optimizations [which have unique advantages traditional pipelined compilers do not]). Likewise, things like Lamport clocks, that require distributing the clock to all components in the system or coming up with schemes to avoid distributing a clock, are being reconsidered in favor of clockless and concurrent designs (e.g. asynchronous circuit designs).

"Modern analysts with

"Modern analysts with computer skills use modeling software and tend to focus on principles of stable design, such as: (1) set-based classification of real world entities into things called "Classes" (2) model-driven engineering, where the compiler does the bulk of the "programmer" work. Relational theory is as important as ever, since good model checking software depends on analysts who understand third normal form."

I always thought that object-orientation (classes) and relational were on opposing teams (object/relational mismatch), and I always thought the object folks were just plain wrong. If you look at the results, most useful (commercially used) software is essentially a relational data model and queries (with stuff on the end like scripting languages to prettify the result). I kind of disregard operating systems/databases/browsers because they haven't produced anything noticeably improved since their first incarnations in the early 70s/90s (the man hours have translated into not all that much - maybe because of their use of c/c++ and object orientation).

I always thought that

I always thought that object-orientation (classes) and relational were on opposing teams (object/relational mismatch), and I always thought the object folks were just plain wrong.

Object-oriented languages with "classes" are not necessarily model-driven engineering "classes". Model "classes" are really just a set-based classification scheme; object-oriented programming languages confuse this with concepts such as "implementation inheritance" and procedural method resolution that allows a subtype to delegate implementation concerns to a supertype, in turn conflating the notions of type and class. Likewise, issues such as collaboration are not addressed by set partitioning, but rather by collaboration schemas (typically expressed via a collaboration diagram); object-oriented programming languages instead achieve collaboration via block-structured procedural message passing, which makes things like clockless, concurent, asynchronous, distributed implementation awkward and error-prone and why model-driven compilers implement this for you. Likewise, due to the linear store of the von Neumann model, many object-oriented programming languages distinguish between composition and aggregation (which is a slippery slope that must immediately address new problems such as ownership of the resource and resource lifecycles). In fact, the first version of UML distinguishes between composition and aggregation as well, principally to allow reverse engineering source code written in object-oriented programming languages like C++ (improving the accuracy of model extraction). Fortunately, UML 2, with MDA Profiles, has cleaned all of this up. Unfortunately, UML now looks beyond bloated to the untrained eye and it takes a real expert to understand all the layers and surface volume. Also, object-oriented programming languages tended to encourage ambient authority (e.g. C++ "friend" construct), which is antithetical to object-oriented modeling, since objects themselves are the simplest and most uniform construct for building secure distributed systems; object-oriented programming language elements like "static" and "import" thus interfere with truly object-oriented principles like having a nonlinear word-datum limit*. As a consequence, the theory of naming in most object-oriented programs violates encapsulation of the "objects" in these languages, in the sense that these languages do not provide extreme hiding and protection of state-process, because you should only be able to invoke an object you can name, and the object should only respond if you have the authority to ask it to do something.

I attribute the misunderstanding between classes and types to Liskov's Substitution Principle paper, which many accidentally mistook as an axiom of object-orientation rather than a derived lemma. For example, Lieberherr's Adaptive Object-Oriented Software programming paradigm, based on the principles of adaptive grammars, actually has little to do with developing real-time object-oriented embedded systems. Lieberherr qualified his use of the "object-oriented" buzzword in part by stating his system allowed preservation of the Liskov Substitition Principle, yet this was only true in a very twisted sense. (This is not meant to disparage Lieberherr, as many of his ideas, such as combining dictionaries and dependency propagation graphs, have ended up being re-invented by Inversion of Control/Dependency Injection libraries such as .NET examples NInject, Autofac and StructureMap in the form of buzzwords like "Profiles", "Autowiring", and "Contextual Binding"; in doing so these IoC libraries scrap the Demeter project's useful visualization language, powerful asset querying language, and convenient syntactic DSL. On the other hand, the key improvement made by these IoC/DI libraries is runtime containment for provisioning resources to subsystems, which allows hotswapping at the expense of not being able to directly reason about the effects of hotswapping on things like caching. Hotswapping turned out to be such a popular feature, due to the startup time of many large applications, that the weaknesses have so far been almost totally ignored, to the point that many engineers conflate hotswapping with run-time modularity.)

Also, the importance of third normal form in modeling is independent of implementation. Not putting knowledge responsibilities into normal form will necessarily lead to unnecessary guards, and in any complex system, guards hidden behind interfaces will be the cause of architectural decay as it explodes the state-process based on individual elements' peculiarities rather than sets of elements. No mathematical discipline is interested in studying individual elements on an ad-hoc basis, and object-oriented modeling is not an exception.

The major problem of OOPSLA research in the '80s and early '90s was that things were happening so fast that few researchers were closely examining implications. They made overpromising claims rather than trying to distill the nature of good software. Some of the brightest minds, like Henry Lieberman, published only a paper at OOPSLA and retreated back into their academic strongholds (Lieberman has always been profoundly influenced by MIT's AI culture).

* To the best of my knowledge, the thorough benefits of a nonlinear word-datum limit were first enumerated by JK Iliffe in his Basic Machine, a theoretical machine described in his book Basic Machine Principles. However, I have been searching for earlier prior art - tagged architectures existed before then, but a thorough treatment of how they compared to linear store architectures is hard to find.

Well most of that went way

Well most of that went way above my head so I can't refute it, but I will say, when you said "Unfortunately, UML now looks beyond bloated to the untrained eye and it takes a real expert to understand all the layers and surface volume.", it seems to me that something is wrong with the industry. If the point of higher level languages is to make programming easier than assembly, well, this is like going back to assembly.
Also, regardless of what the exact meaning of uml is, whether it's object oriented in the sense of OO languages or not, I still think the relational db level abstraction is better than uml. I know of at least one 'non-programmer' (a managerial accountant) who writes sql queries as part of his job.
When you wrote "Not putting knowledge responsibilities into normal form will necessarily lead to unnecessary guards, and in any complex system, guards hidden behind interfaces will be the cause of architectural decay as it explodes the state-process based on individual elements' peculiarities rather than sets of elements.", I'm not exactly sure what you mean, but it sems like you agree that relational model is better as well.


Also, the importance of third normal form in modeling is independent of implementation. Not putting knowledge responsibilities into normal form will necessarily lead to unnecessary guards

Can you recommend a book or article about this? I never heard about this before.


Executable UML: How to build class models by Leon Starr. However, don't expect Leon to approach the subject by directly explaining normalization. Leon's position is that if you simply look at the problem domain and start to uncover entities and think of responsibilities for those entities and how other entities might interact with that entity, then you'll typically arrive at third normal form. He gives a couple examples, such as modeling graphics applications that manipulate rectangular graphics.

Towards Executable UML - Code Generation From Interaction And State Chart Diagrams by Martin Marinschek, Ph.D. dissertation:

When designing xUML classes, their attributes and
identifiers, the relational principle of the first normal form, second normal form and third normal form should be obeyed – making the close coupling between xUML and relational principle even more obvious. An introductory explanation of these concepts can be found in [=>Normalization], and in [Sta02, p. 61ff] they are applied to xUML; although not explicitly mentioned.

[Sta02] is referring to Starr's book. Executable UML (xUML) mainly derives from the Schlaer-Mellor methodology of "modeling the world in states" and "modeling the world in data".

Also, H.S. Lahman, one of the best modelers I've ever seen, routinely mentions 3NF on usenet://comp.object (e.g. RE: How to Introduce OO to Structured-Method IT Person?) and has a blog post dedicated to the role of normalization in the modeling process. As H.S. Lahman points out, normalization critiques of OO systems are rare, because a poor design is usually the result of not doing abstraction based on a problem domain, but rather the data coming to the messages (instead of the messages coming to the data).

A separate philosophical question is whether this is The Right Way to think about systems theory. As I've already mentioned, adaptive grammars argue otherwise, but adaptive grammars also have not been used in systems theory (that I am aware of... perhaps someone like John Shutt could point us to something, though, as he is quite knowledgable on the topic of adaptive grammars) and so the full implications of adaptive grammars applied to very large scale systems has not been adequately explored (IMHO).

As an aside, more modern object-oriented programming languages, like Scala, have added support for dealing with temporal consistency via incorporating traits into the type system; traits are the most natural way I know of to encode a GoF State Pattern into a textual programming language, but the downside is that you have to encode actors to asynchronously transitition from one Role trait to the next. So it is pretty evident that textual programming languages and system modeling theory will eventually converge.

Declarative Meta Programming != Prolog

That project doesn't necessarily use Prolog, through and through.

Sofie Goderis, for example, is a Ph.D. student whose work I try to follow. She has published articles in the Journal of Object Technology, such as: DEUCE : A Declarative Framework for Extricating User Interface Concerns. I really like her stuff, and here is one great quote from the aforementioned article:

Existing UI approaches, including the ones based on the Model-View-Controller (MVC) architecture [23, 17], support a limited form of separation of concerns (see also section 7). More precisely such approaches focus on separating UI visualisation (in a view) from the business and data logic (in a model), and neglect the extrication of UI behaviour. As a result, evolving the UI's behaviour still requires browsing the source code, manually adding new UI behaviour and subsequently connecting it to the business logic where appropriate. In addition the code handling for the contextsensitive nature of the UI needs to be scattered throughout the application as well. This is why we advocate a built-in solution that covers the separation of all concerns in an integrated manner.

Like Sofie I am personally also interested in using constraint programming for user interfaces. However, I do not quite believe that declarative meta programming as practiced in DEUCE is the right way forward. Lately, my interest in GUIs has led me to recommendations by Allan McInnes and Charles Stewart to look into Hyperedge Replacement Grammars (thanks Allan) and Constraint Multiset Grammars (thanks Charles). Constraint Multiset Grammars already have novel applications to graphics applications in the area of pen descriptions; I am still wraping my mind around HRGs so I don't know if they're useful or not.

As a comparison, Sophie uses SOUL and the Cassowary constraint solver as the basis for DEUCE. Leo Meyerovich and I recently had a discussion about Cassowary on LtU. See: Computational complexity of cascading stylesheets?

Comments on posts in this sub-thread

Z-Bo your initial questions to the OP are great, and his response that he's making an emotional appeal explains a lot.

I have a lot of sympathy with him as a user. Much of our advanced PL-foo is really not that useful to the typical domain expert who is just trying to get something useful done. Are module systems and types part of the accidental or essential complexity of the situation? To the OP, they are clearly not essential. But raould's point about what happens after the 80% cases is also important, because there is a big difference between solving domain problems and building large computer systems. But most PL people don't understand this, because the only programs they write (or can imagine writing) are big PL systems (compilers, type checkers, theorem provers, model checkers, etc).

After that the discussion between the two of you gets pretty crazy. Do you really maintain that there are fewer interesting papers at OOPSLA than POPL and PLDI? That kind of sweeping comment isn't very helpful. But you are right that OOPSLA is focusing on the concerns of the classical notion of "programmer", and that this is evolving. Z-Bo, you make other strange comments about object-oriented programming languages. Your comments focus too much on inheritance and not enough on interfaces. Also, you seem to be assuming that OO = Java which is clearly not the case. Your other comments, such as the connection between Lieberherr's work and dependency injection are interesting. But you are jumping between concepts so quickly that few of us (if any) can keep up. The mixing up of classes and types goes back to SIMULA, I think. Smalltalk and its progeny "fixed" the problem by eliminating types, but Java still hasn't gotten it fixed. More recent languages, including Go and Grace, take another stab at fixing this problem. Please do not ascribe the flaws or limitations of a few languages to the entire spectrum of OO.

I completely disagree with the idea that objects and relational databases are on opposing teams. The "object/relational mismatch" is NOT a mismatch of information structures. Both objects and relations have very similar structure, where classes are analogous to tables and relationships analogous to references. [Edit: "relationships" for "relation"] The fact that UML Class Diagrams are based on Entity Relationship Diagrams should make this clear. The mismatch is a algorithmic mismatch between sequential one-at-a-time processing in PL and bulk processing in SQL. The fact that Haskell needs libraries to talk to SQL (like HaskellDB, which was the basis for LINQ in C#) is because you have to lift the queries out of the functional program to get them to execute efficiently in the database. This solves the problem of "functional/relational mismatch". In other words, the problem is PL/DB mismatch which can arise in any PL. Even Prolog can have this problem, I think, unless they have done work to integrate tables into the computational model and extract queries automatically.

See the original paper that described impedance mismatch:
David Maier. Representing database programs as objects. In Francois Bancilhon and Peter Buneman, editors, Advances in Database Programming Languages,
Papers from DBPL-1, pages 377{386. ACM Press / Addison-Wesley, 1987.
or my paper Integrating Programming Languages & Databases: What's the Problem?

I agree that UML has gotten completely out of control, and is misguided in several ways. UML was a useful starting point for model-driven development, but us becoming increasingly marginal as we make progress. If you do some searching, you will see that there are lots of effective critiques of UML, and also many better ways to do modeling. I have said UML is the worst thing that ever happened to MDD.

marshallp, I think you are raising great points here. Unfortunately the title of your post made me ignore it at first. If you haven't done so yet, you should read "Out of the Tar Pit". Actually, everyone should read this. I love the first half, and agree in spirit with the second half, although I prefer Entity-Relationship modeling to raw SQL tables. That is a small point.

Both objects and relations

Both objects and relations have very similar structure, where classes are analogous to tables and relations analogous to references.

Sorry, in what way are relations analagous to references?

The best way IMO of dealing with OO style classes in a relational setting is the approach OWL and description logics take. Classes are unary relations and attributes are binary relations. You can then denormalise from there to combine many functional attributes into a single relation. This makes the binary graph structure of OO clearer and explains why you end up having to reify so much when mapping a relational schema (hypergraph) into OO land. It also handles inheritance cleanly.

Maybe off topic, but isn't

Maybe off topic, but isn't the impedance mismatch moot these days with en masse movement from relational to KV stores and graph databases? Actually, with graph databases, the mapping between objects and data becomes trivial; after all these are very similar to the OO databases of the past.


Well, if there is a mass move to graph and kv stores, it ain't happening near me as far as I can tell. Most places I know are sticking with their big Oracle installs. The NoSQL approaches look like they've still got a way to go for mass adoption. If they did see better adoption, I would mostly avoid them. I prefer a more expressive foundation for my data and OO/binary directed graphs isn't it. With any luck though we'll see more expressive languages (eg Datalog) built on top of these new dbs, and we can replace SQL with something that's actually better.

I'm really out of my element

I'm really out of my element here, but since I work in a systems team (working on Bing-scale problems), its been a long time since I've heard of SQL as being the solution; its also being avoided for a lot of the new phone and web apps (talking to startup engineers, reading hackernews). Someone explained it to me that SQL was used as a hammer for a lot of nails that it wasn't really appropriate for (but it was the only hammer), and now their is a contraction away from these unsuitable use cases (and perhaps over contraction in some cases, it will balance itself out later).

Graph being limited to binary relations is a feature, not a limitation, from a performance perspective.

You aren't saying anything new

Go to Minnesota and the Charles Babbage archive and dig up "The Great Debate" at the 1974 SIGMOD between Codd and Bachmann. Everything you are saying was said by Bachmann. Bachmann based his experiences, up to that point, on "high performance" database requirements. So why did the Relational Model dominate for 2+ decades?

But I think it's crazy to use startups as an example for good software engineering practices. They're not engineering for data quality, but rather quantity of users. Putting PageMaker on the Internet is no big win at all, because that's what these startups do.

My honest hope is that these two views will eventually converge. After all, consider how BigTable and MapReduce and SSTable are used today. For the sake of civilization, we need to build something better than those current practices. Something better will require treating behavior as data and utilizing that to design storage requirements. We just don't think about synthesis at this level of detail yet. We will be forced to.

For what it is worth, I once had a summer project that took performance counters from a DB and OS and automatically would identify performance issues -- nobody wants to have to interpret graphs or charts, they want a fortune cookie to pop out and tell them what to do. I was so annoyed at how limited I was in going the extra step and not being able to build a self-healing, self-optimizing storage system. I wanted to reason about truly complex requirements, and nothing was available to help me do so.


In my experience, no mapping between objects and persistent data stores is "trivial" for anything but a toy schema.



1) What is the value of "moot" here?

2) The so-called Schema Mapping Problem is AI Complete. Have Confucious hit you on the butt with a paddle, and think for a bit.

3) Current (functional) query languages for NoSQL data stores still have open research problems that are solved by SQL.

4) Anyone who has thought about optimal data storage at all would do well to re-read Codd's seminal essays, and the debates between Bachmann and Codd. Everything espoused by NoSQL will eventually, hundreds of years from now, be subsumed by Codd's approach or a more general approach like David Spivak's.

Impedance Mismatch != Data Mismatch

I'll repeat it again: Impedance Mismatch is NOT primarily an issue of data type incompatibility! Why does everyone thing that this is the point?? Most programming languages have rich enough type/object systems to represent relational data or graph data without much problem. I'll repeat it again: Data representation and mapping is not trivial, but it is NOT THE CORE PROBLEM. The key problems are:

1) database optimization is based on whole-query optimization. A sequential program that traverses a database accesses data one item at a time, so there is no way to perform whole-query optimization. A functional program that does not delineate "the query" from the rest of the program can have the same problem.

2) databases are logically remote, so communication of results from the database to the program must be optimized to avoid communication latency. Again, one-at-a-time processing does not facilitate bulk transfer of data.

These are two of the key characteristics of impedance mismatch. Its about the flow not about the data structures involved. Similar issues can arise with the user of transactions.

I've been saying this for a while. It's consistent with the history of the topic. For example, has anybody noticed that early OO databases still had impedance mismatch with OO languages!! Just because they both used objects didn't mean that they solved the query, bulk transfer and transaction problems. Thus they didn't really address impedance mismatch at all. Declarative transactions as in MTS helped. But they later introduced JDBC-style execution of OQL into an OO language talking to an OO DB, to "solve" the query problem, but introduce so many new problems that the impedance is still mismatched.

I mean, if you think I'm wrong on this, let me know.

Well, if you view the

Well, if you view the database a just a big persistent data structure and map it as a big bag of records, then sure it's easy. But SQL can represent a sizeable subset of first order logic. The idea that OO languages are representationally adequate is a big assumption. It seems to me that eg Java EE has decent solutions for the problems you mention, and yet people still have problems adequately mapping their database schemas.

Doing It Wrong

The database schema shouldn't even matter to clients, and vice versa (the client language should not matter to the database schema)*. Sounds like people have mapping problems because they don't understand abstraction. Is it possible they have non-mappng problems as well, but lack expertise to tell the difference?

Beyond the schema mapping problem, beyond the view update problem, beyond the Impedance Mismatch, there are other issues. Such as those described (quite wonderfully) in William Kent's Data and Reality.

* Unless, of course, we are speaking about the generic case, where the application is a Query By Forms or Automated Visualization of Relational and Cube data stores, such as Tableau Software's tools like the TableLens.

Schemas Matter

Schemas will always matter, idealism that they shouldn't notwithstanding. Data models are naturally heterogeneous, usually in incompatible manners (data loss on any conversion). And even for lossless conversions, most representations are not homeomorphic (i.e. isomorphic in a manner that also respects continuity and modularity). Even if we can map data to data, that's pretty useless in a world with live, streaming, or encapsulated data (isomorphism is not enough).

I said that

Google "Schema Matching Problem", another term for what I called "schema mapping problem" at least 3 times already in this thread due to forgetting the correct term.

There is also a versioning problem. There is also a problem cleaning data, e.g. to merge dirty or ambiguous records into canonical records, such as Post Office Normalization.

There are algorithms and heuristics for dealing with these issues.

As I said, I believe most programmers are not experts and struggle to reason about even the basic theoretical limitations they bump up against. (Programmers I work with don't even know how to do gzipped chunked encoding.) Schema Matching is jokingly referred to as AI Complete, in that the only real solution is human's matching data and verifying it over time. Various solutions assume some uncertainty, although not necessarily the same kind of uncertainty, and none are perfect.

AI Complete

AIs will certainly be part of completing this - i.e. via unsupervised machine learning.

Abstracting the schema

The database schema represents the domain model. It is to a first approximation a logical signature of your domain. When you say it should be abstracted, are you implying that the domain model itself can be abstracted, such that users don't have to care about it? If so, I'd be very interested to hear how you propose to do that!

Funnily enough, I was just skimming Data and Reality this week. I haven't yet given it a full reading. My first impressions were that it makes some decent points, but the field of knowledge representation has moved on a lot since then.

I've seen it done different ways

A simple proposition: A (relational) database schema represents the universe of discourse. It has no inherent meaning (domain model). Adding domain-awareness would actually weaken the theory.

I would like to hear some concrete examples of the problems you have seen. It is always nice to reflect on failure, but as I said to Cook already in this thread, I have learned the importance of knowing why I succeeded in the past. I worked on a project supporting 100 different database schemas, and it was no sweat at all thanks to "software factory"-style approaches and careful partitioning of features to avoid combinatoric explosions. The problem domain was international K-12 school information management systems, where each customer had a different grading system, class scheduling, grading period organization, regulatory requirements, development module, etc. We had explicit solutions for versioning, data cleaning, etc. built into our software. We were still missing some features I've come to appreciate from reading LtU, such as solutions to the expression problem e.g. we didn't have a good way to change the representation of a data type displayed on a form, so our display engine was a bit of a hack in that the number of representations was deliberately fixed because we didn't know how to do better. I can't say if the people I worked for were geniuses or idiots, but I can say they never lost a customer (100% customer retention rate for SaaS is unheard of, as far as I know). Think about what that means in economic terms: generally if you are a factor of 3 more expensive than your competitors, you will get replaced no matter how good you are, but we were able to be reasonably affordable. The top private K-12 schools in the entire world were using our software, to boot.

I have also been thinking about the points William Cook is making (caveat: David Maier, author of the impedance mismatch paper, co-authored a book with Zdonik, a colleague of Cook's graduate advisor, Peter Wegner. In that book's introduction, they argued a system could have only 3 of 4 attributes: value substitution, static type checking, mutability, and type constraints. Date & Darwen, in the Third Manifesto, wrote a different take on the matter, organizing design mistakes into "the first, second and third great blunders" and arguing in Appendix G of their book that Zdonik and Maier assumed support for object identifiers. Date & Darwen's Third Manifesto cannot support object identifiers, as it would subvert their solution to the so-called "first great blunder".)

I am not sure I wholly agree with Cook. OO models abstract knowledge responsibilities into normal forms, which means data + behavior. Relational models abstract data responsibilities into normal forms, which means data and no behavior. Furthermore, I have a tough time fitting my preferred way to do things into standard OO vocabularies like DDD, etc.

But one thing I do agree with is that partitioning barriers dramatically affect software architecture. The fact that SQL is generally written ahead of time and compiled, really limits what you can do, because it is a partitioning barrier in two senses. One, you have a fixed set of classes on the client side which can consume a fixed set of queries in the middle end. Two, the user is limited to triggering a fixed set of inputs (this limitation, I would argue, is where the versioning problem comes from). This impacts the design choices DB vendors make as well, in ways most programmers don't even realize. Consider the design of the Tabular Data Stream protocol for SQL Server [MS-TDS] and [MS-SSTDS], as well as ODBC, JDBC, and ADO.NET drivers. There's no concept of "remote batch invocation" as Cook and Kiselyov have separately demonstrated recently.

Once you go that far -- realizing the effects of such partitioning barriers -- you should start asking interesting mathematical questions about reflection. For example, is it possible to get more out of something than you put into it? How DO you bootstrap a system to allow users to talk with a database? In this sense, the number of inputs to an interactive computation can change over time and vary by end-user interaction. Such open-ended communication then brings up interesting performance concerns nobody thinks about today because they are too busy shoving things into quick fixes like graph databases (as Sean McDirmid admitted, that's the only systems approach he thinks of -- alternatives don't even enter his mind). For example, how do you throttle users so they don't ask the DB questions that would kill the QoS for other users? The typical approach is to used stored procedures and control access via these procedures, thereby managing the performance problem rather than solving it. In this way, we have never had to worry about the computational complexity of a query OR pre-modeling its execution performance on hardware prior to running it.

Anyway, this is probably not the answer you were looking for, it's a bit outside the box and lofty, sorry if this just frustrates you. We can talk more offline, rather than clog up this thread more.

A few brief thoughts

I'll add a few short notes here. I haven't yet fully digested your reply, and you're right that we should probably take this offline rather than clog up this thread. I'll be away for most of the next two weeks though.

A simple proposition: A (relational) database schema represents the universe of discourse. It has no inherent meaning (domain model).

Well, yes, I'd agree that database schemas have no inherent meaning. They're just first-order signatures. I believe that the relational model is designed to admit a single unique minimal model, though, but I don't have a citation for that to hand.

I would like to hear some concrete examples of the problems you have seen.

Well, there are relatively trivial things, like having to reify relations with arity > 2 into entities, which can feel a little clumsy and very clumsy when there are "triangular" relationships (e.g., "illegal actions are those that are performed in a jurisdiction that has a law against that action"). More difficult in my experience is that you cannot represent some kinds of inferential knowledge in the OO representation, and so you fall back on comments (e.g., "this entity maps to the view promotion_candidates which is defined as all employees who have received 'Outstanding' ratings in their last 3 performance reviews and are currently at the top end of their pay grade."). Indeed, any complex logic is usually hidden behind views (which are effectively opaque from the application point of view), or encoded in some query language stored in embedded strings (e.g., JPA named queries). You can view this as encapsulation, and that the application doesn't need to care, but in practice I find these abstractions (if you can call them that) tend to be very leaky. Generally, I view these as symptoms that you are encoding your application logic in a language that is very weak at encoding such logic.

Sorry if these examples are a little abstract!

Got it, this makes a lot of

Got it, this makes a lot of sense. Impedance mismatch is basically the same problem we have when we try to program a GPU using CUDA or crunch lots of data on a cluster using MapReduce. Graph databases, when efficiency is considered, also have that problem since we would write map-reduce-like kernels to get any kind of decent performance on a huge graph.

Related, recent article in

Related, recent article in Queue by Erik Meijer:

In the database world, the raw physical data model is at the center of the universe, and queries freely assume intimate details of the data representation (indexes, statistics, metadata). This closed-world assumption and the resulting lack of abstraction have the pleasant effect of allowing the data to outlive the application. On the other hand, this makes it hard to evolve the underlying model independently from the queries over the model.

As the move to the cloud puts pressure on the closed-world assumption of the database, exposing naked data and relying on declarative magic becomes a liability rather than an asset. In the cloud, the roles are reversed, and objects should hide their private data representation, exposing it only via well-defined behavioral interfaces.

The programming-language world has always embraced such an open-world assumption by putting behavior at the center of the universe and imperatively building reliable applications from unreliable parts. Object databases are back with a vengeance. To help developers transition to the cloud, the existing class libraries and tool infrastructure need to evolve to run as highly available services exposed using regular object-oriented programming language interfaces that reflect the relevant operational details.


How does JDBC "solve" their problems? Curious for the history on this.

My main gripe with JDBC, coming from somebody who has written an ORM, is that most JDBC implementations hide the critical tuning parameters from the programmer, leading to inefficient ORMs and false/unnecessary design compromises.


I just meant that API that lets you execute query strings does let a programmer optimize the communication to the database, but the resulting code is usually a mess. People do it because it solves the big problem of database interaction, even though it introduces a host of smaller problems and additional accidental complexity.


Sorry, I meant to type "relationships are analogous to references". You can certainly atomize relations into triple-stores or explode the graph structure in other ways. My point is that objects can represent these structures fairly well. As others have mentioned in this thread, data modeling itself, and the choice of representation for data models, is far from trivial. But noting that it is complicated (and requires tradeoffs) is not the same as saying OO and relations are in opposition.

Most of what I said is a giant mess

Thanks for forcing me to re-read that giant mess. Must be torture for others.

Shocked by most of what was said.

I agree I jumped from topic to topic too quickly.

My comments focus too much on engineering mistakes and not enough on simply the right way to do things.

But, to be fair, I was 25 when I wrote that giant mess... 3 years really helps slow down the thought process and form better thoughts.

We can probably discuss more offline.

LtU post order

Hi Z-Bo I just wanted to point out that some of your posts that appear above this one were actually posted later, and so the "giant mess" doesn't apply to them :-)

I have also noticed lately that it is generally better to focus on describing correct solutions than (over)analyzing engineering mistakes. But maintaining the right focus requires some discipline.

Pure Prolog?

When you mention the reserved words needed for pure Prolog, you include ! (for cut), which I think one doesn't usually think of as ‘pure’. Do you have some non-standard definition in mind?

By 'pure' prolog I mean the

By 'pure' prolog I mean the logic and procedural features of prolog, - (remove the recursion/lists/map etc.)


I must be missing something. How is induction not part of logic? And how is recursion not procedural?

I think that means ....

... that if you want to write pure Prolog using a standard Prolog interpreter, you can't give ! its own meaning: it is a reserved predicate.

For a narrow set of problems...

Prolog is well suited to a narrow set of problems. I once wrote a hardware system configurator, where you specify parts you need, and the system adds all the necessary power supplies, wiring harnesses, etc. required. That part of the system worked very well in Prolog.

Writing the menu system to go with it in pure Prolog was a horror. That's an imperative problem, and Prolog is terrible for imperative problems.

Compared with SQL, SQL is a language understood by smart database query optimizers which figure out, based on what indices are available and how big they are, how best to perform a query on large data sets. Prolog is a dumb inference engine which uses the same dumb strategy on in-memory data every time. That doesn't scale well.

Discrete event systems

Because it does not seem to have been mentioned so far I would like to add that Prolog and its variants can also be useful in playing with different "discrete event systems" like finite state automata on words or trees (I'm aware of some work that uses tree automata in XSB Prolog to speed up some parts of a model checking procedure) or Petri nets among others. When working with those kinds of models you'll be interested in asking questions like "What state(s) will be reached in response to a certain transition sequence?" or "Given two states, what are the transition sequences going from here to there?" If the system is implemented in Prolog you'll get this kind of forward and backward reasoning essentially for free. But I will readily admit that you also run into the problem of big search spaces pretty quickly, but then you did not have to invest an aweful amount of time and effort into coding up your model.