Looking for references on the expressiveness and computational completeness of a relational programming language

The basic operations of a relational query language are equivalent to first order predicate calculus, which means they fall short of being computationally complete. This applies equally to modern SQL excluding SQL/PSM (which adds a procedural language) as it did to Codd's original proposals.

I have created a language Andl which implements the relational algebra augmented by (a) the ability to compute new values that are not in any relation (b) ad hoc aggregation functions (c) ordered queries (like SQL windowing) and (d) recursive fixpoint queries (equivalent to transitive closure or SQL Recursive Common Table Expressions.

I speculate that without any explicit looping or branching constructs, and based on relations as the only holder of state, that this language is Turing Complete. At any event it is capable of a surprising range of interesting computations (such as a Sudoku solver).

I am looking for references or theoretical work, or other implementations, to pursue this question.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Related implementations

The 'out of the tarpit' paper describes a variant on relational programming. There are some implementations of that, e.g. Chris Granger's work on Eve. And there is also 'Rel' from Dave Voorhis (though I haven't paid attention to it in years). And K/J languages might be relevant.

Edit: Looking at Datalog, Dedalus, Bloom is also good.

I think that given (countably) infinite relations, like `successor` on natural numbers, together with transitive closure, you can achieve Turing completeness.

re implementations

Yes, OOTP is a great paper and has inspired some projects including m36. https://github.com/agentm/project-m36.
Yes, Rel continues to move along. Worth a catch-up.
No, I don't know much about 'that' Eve. Will look into it.
No, 'K/J' doesn't ring any bells. A link?
Yes, Datalog has many useful ideas.
Daedalus? Solving equations?
Bloom? Disorderly computing?

If succ()+transitive closure gives you Turing Completeness, where would you look for research on that?

Basic machines

With infinite relations, transitive closure, and any simple 'small step' Turing complete model (like cellular automaton), you can query for a future state that meets some criteria (like a halting condition).

K and J are collection oriented programming languages with entries on Wikipedia. K in practice makes heavy use of column structured databases.

For Dedalus, I was referring to the predecessor of Bloom. I actually find it more interesting than Bloom.

Generalised Transitive Closure

Relational algebra is lacking the ability to calculate the transitive closure of a relation. I believe adding a generalised transitive closure operation to relational algebra's existing five (restrict, project, union, intersection, cross-product) would result in turing completeness.

try to implement

My recommendation is to try to implement Turing machine in your language. Then you will know for sure.

Why transitive closure?

Based on the 'Alice' book http://webdam.inria.fr/Alice/,
I have implemented 'while'. This is fixpoint iteration, comparable to SQL CTE RECURSIVE, and to transitive closure.

I'm looking for other work on or around this topic; either other implementations or academic research underlying it.

Relational Algebra plus Transitive Closure is Pure Prolog

Why Transitive Closure, because otherwise relational algebra cannot answer queries like find the least common ancestor (given a table of parent/child relationships). Relational Algebra plus Transitive Closure is computationally complete.

Interestingly logic has equivalent expressive power to relational algebra, and Prolog has recursion, so the pure variant of Prolog is effectively relational algebra plus Transitive Closure. So if you are looking for a declarative query language that is Turing complete and has minimal syntactic overhead then Prolog (or Datalog) is the answer.

If you consider that Prolog only has clause definition as its only syntax, that is pretty amazing expressive power.

Complexity of Datalog

Pure Datalog (without negation) is equivalent to relational algebra without set difference and with added recursive queries. In this case all queries terminate (in exponential time), so it is not Turing complete. Adding back in negation (whether stratified, well-founded or stable-model semantics) increases the expressive power considerably to classes like co-NEXPTIME, but not to full Turing completeness. You have to add function symbols to get semidecidability and thus Turing completeness. See e.g. this survey of the computational complexity of various logic programming variants. There is an extensive literature on this for Datalog and also for other logic languages such as description logics.

Don't datalog programs

Don't datalog programs execute in polynomial time? Or are you talking about the complexity as a function of the length of the datalog program?

Data vs program complexity

I was referring to the program complexity (or combined complexity), which is as you say. For data complexity (fixed IDB) it is indeed polynomial time, but it seemed better to quote the more dynamic case where the program can itself be part of the input.


Posted in wrong place.

Of Course Prolog is Turing Complete

The above comment about Datalog seems correct (it adds transitive closure to relational algebra, but takes away negation), but then you seem to go wrong in the second part. You don't need anything more than a pure subset of Prolog for Turing Completeness.


Prolog has function symbols

Prolog has function symbols, Datalog doesn't. Your SO answer makes exactly the same point that they are required for Turing completeness in addition to recursion.

Function Symbols

You are right, I guess I just haven't made the distinction between Prolog clause heads with single or multiple arguments. This seems to me to be a naming thing. My pure Prolog interpreter just has atoms, structures, clauses and rules, so I did not see a distinction.

Turing machine is easy...

Every imperative programming language has assignment and looping, and therefore implements a Turing Machine, but the relational algebra has more of a functional lean to it. I'm researching what needs to be added to the RA to achieve comparable levels of expressiveness and computational power without going down the imperative path.

Prolog or Datalog

Short answer, Prolog or Datalog. For longer answer see above.

Edit: see above, Datalog is not Turing complete, however Prolog is.

Modern datalog is TC

Datalog is a declarative logic programming language that is syntactically a subset of Prolog, but in which queries are deterministic.
Modern Datalog implementations include negation and functional symbols, and thereby achieve TC.

Right for wrong reasons.

So I was right for the wrong reasons :-) I should have just stuck to the original insight, that Prolog (and now Datalog) is what you get when you add what is missing from relational algebra to make it Turing complete.

I think I would rather say

That Datalog is what you get when you extract a deterministic subset of Prolog that you can use as a query language, solve the negation problem and then add back in just enough to make it computationally complete again.

Simplicity and Elegance

Perhaps, but Prolog seems to be simpler to define and has that combination of simplicity and elegance that we call expressive power. Stratified negation like Datalog has is interesting, but I don't think it is the solution to the negation problem. I think something like constructive negation is more likely to be the solution to the negation problem.

Edit: I think I should really say "Prolog Like" as I am talking about the pure part of Prolog (no cut or other unsound operations, unification with an occurs check).

Some additional references

I received the following off list. I found them useful.

Computable queries for relational data bases

Domain-Independent Queries on Databases with External Functions

Universality of Data Retrieval Languages