Persistent functional databases

Using nomads to facilitate 'destructive' updates in a functional setting feels cumbersome and hack-ish for a (semi)-functional programmer like me.
An alternative would be to use persistent data structures that exploit the non-destructive property of purely functional ones, without incurring great costs.

Chris Okasaki has been one of the few researchers that seem to have been ignoring nomads all together. Instead, he has created very interesting alternatives called 'confluently persistent data structures' for almost all the traditional ones, such as lists, arrays and maps.

Armed with these algorithms, one could build a full (relational) database engine that doesn't destroy any data, but only adds it while still being able to jump to previous versions instantly - O(1) – giving enormous scalibility, distribution and accountability advantages over traditional solutions.

I've found one implementation of such a database: Herodotus. It isn't confluently persistent however, because it only has read access to previous versions, thus making it only 'partial persistent'.
Still, I think Herodotus may be the first, but also the next step in replacing the imperative world of database technology today.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

monads?

Is the following paper the best one to understand Okasaki's work?
Functional Data Structures (1996)

For a while now I've been tyring to get through Comprehending Queries by Torsten Grust (and many supporting papers). It is very interesting (from what I am able to understand). Look forward to getting into Okasaki's work as well.

get his book. iirc it extend

get his book. iirc it extends the online stuff. and it's not expensive.

mnesia

Don't know much about it but I've heard good things about mnesia (paper).

It's because they roam . . .

. . . that Okasaki ignores nomads. Persistent data structures are the agricultural revolution to your single-threaded destructive update.

Wait, you meant monads?

:-)

nomads != monads

We should not ever ignore nomads as a group of people, and certainly not use them for running software.

Anyway, the question remains - do we really need monads to implement stateful applications?

Nomadic Wisdom

As far as my experience goes, you don't need monads. You just need them to encapsulate state. I.e., in the case of the IO monad, by not making the world (state) explicit but only revealing the actions which work on that state, typesafety gives you a way to ensure that the world (state) is never revealed and thus cannot be copied (thus no old reference to an old world state can exist).

Another manner, as implemented by Clean, is to explicitly pass the world (state) around and make sure that at all time there is just one unique reference to that state (thus no old reference to an old world state can exist).

Personaly, half of the time I think 'why bother'; isn't it a solution to a problem which doesn't really exist? I said it before, the world is there, always; why can't I just use it and must I pass a reference around? I know the philosophical answer, it just bothers me as a programmer.

Synchronization

marco: Personaly, half of the time I think 'why bother'; isn't it a solution to a problem which doesn't really exist? I said it before, the world is there, always; why can't I just use it and must I pass a reference around? I know the philosophical answer, it just bothers me as a programmer.

It basically depends, I think, on how far you wish to go with the idea that any function can be used anywhere, anytime. If you're willing to learn to use monads and monad transformers, then you gain a great deal of freedom in how you compose functions, when and where you call them, and so on. It avoids a whole raft of real-world problems. I've lost track of how many projects I've seen go off into the weeds because object A expects object B's state to be X at time T, but because of some refactoring, object B isn't yet in state X at time T, etc. There's something to be said for having the language help enforce proper sequencing.

Not convinced...

Bad example. In the monadic functional variant of your example object B would have been stuck into some monad and passed around and at some point assumed to be in state X. Now somebody changes B's behavior, and presto, B isn't yet in state X when A wants it, same run-time error. Don't see where monads help here since the only difference is that the world is passed around explicitly.

Hmpf, lets drop the discussion.

Re: hmpf

But it was just starting to get interesting! (Seriously.)

Speaking as a programmer...

...the reason you don't want to pass references around if you can help it is because aliasing is a giant fucking pain in the ass.

Speaking as a theorist, the reason that aliasing is a giant fucking pain in the ass is that keeping track of which program values are and aren't aliased, and how, is not something that conventional first-order logic is very good at. If you've got the primitive proposition that 'a != b', then if you want to state that (say) six different variables (or array elements, or object fields) are distinct you'll need to write something like:

a != b & a != c & a != d & a != e & a != f &
b != c & b != d & b != e & b != f &
c != d & c != e & c != f &
d != e & d != f &
e != f

This is bad, because the number of non-aliasing conditions goes up quadratically with the number of variables, and even worse, because if you've got an array of mutable data structures, you also need to state that the elements of an array are distinct.

So if you've got a program with aliasing, then if you try to reason about it in FOL, you're going to run into the problem that that quadratic blowup means that you'll have about an ounce of real specification, and endless, endless reams of completely uninformative non-aliasing conditions.

So you cheat, and get things wrong, and your program crashes. Or, you can use a spatial logic, like separation logic, to reason about the heap. But now you're using some of the most exotic logical formalisms people have ever invented to reason about your programs. That may or may not be a practical response, depending on how you define practical -- do you consider keeping the heap more or less practical than sticking to the sort of mathematical reasoning everyone actually understands? (I can see a case for either viewpoint, myself.)

"Read me first"

Be sure to read legal notice before you "use this Site" (Herodotus). Also read patent info on Herodotus.

Some people don't get it

I hate it when companies don't understand the internet. If you don't want something linked to fine, don't put it on the internet. Anything that has a url is fair game though. The entire notion of being able to control who links to you is repulsive.

History as total order

It isn't confluently persistent however, because it only has read access to previous versions, thus making it only 'partial persistent'.
I wonder if it means the same as my complaint, which is treating history as total order. I doubt this works for many problem domains.

One counterexample would be undo/redo - imagine undoing all transactions committed since last Monday because you want to undo a single wrong transaction. Cause/effect is usually an DAG, not a chain.

BTW, the difference extends into the future as well as into the past - with an DAG (partial order) history has multiple "points of growth", while with total order - just one (that's why I feel my complaint is similar).

Undo/Redo is not a counter example.

You could also undo a transaction in version A, by introducing a new transaction that fixes the problem, creating a new version B.

The nice thing about being 'confluently persistent' is that you can 'merge' any versions (old, new it doesn't matter) to create new versions in O(1) time.
The only thing that's really difficult is a three-way merge between versions A, B, C in O(1) (if applicable).

Imagine version A that is altered by process 1, producing version B. Imagine another process 2 altering version A, producing version C.
If everything works out fine (there are no collisions), then we can merge version B and C using base version A (= three way merge) with process 3, creating version D.

You can compare this approach with total ordering on multiple concurrent writers in such a way that they don't deadlock if there is no need for it. This technology is already implemented in an imperative setting (Oracle, Sybase) after years of research.

I would like to see a pure functional implementation which should be much simpler to build and prove correct.

But may be not, because implementing confluently persistent data-structures is not for your average programmer.

Maybe FP could be a little more loosen when it comes to I/O

I understand that FP programmers are used not to think in terms of execution order, so I/O, since it is destructive update, is a problem for functional programming languages: you never know which operation will come first. Monads is a nice abstraction that provides the sequencing needed for I/O. But using monads on a wider scale may be a problem (for example on whole bunch of applications needing sequencing), and certainly imperative programmers have trouble adjusting to the concept of monads.

But what would happen if there was a functional programming language in which order of execution mattered? maybe it wouldn't be that bad after all! I think that many programmers can cope with the problem of execution order very easily, since it is something that relates to the real life: things must happen in specific order, otherwise the result is not the wanted one. After all, even in functional programs, code must be organized in a way that certain things take place before other things.

I think the problem of referential transparency is way much greater inside a program than outside of a program. I/O usually happens in pseudo-monadic fashion anyway: the data are written to a file in one place, as a batch operation. For example, most applications do I/O when the user presses the 'save' button. Maybe another program comes and modifies that data...but that does not matter for the current program, because the data the current program is interested in are already present in the program's memory. Referential transparency is much more important inside a program, because the chances of introducing a bug increases exponentially with every variable added in an imperative program.

Let me give you a concrete example about that: let's say that one part of a program reads from a database a list of 'customer types' and another part of the program writes those 'customer types' to the database. Let's also suppose that the a computation in the program depends upon the customer type (this is a real example taken from mobile telephony applications)...let's now suppose that the user deletes one 'customer type' from the database. Does it matter to the application? I think the problem is not that great, because this action is deliberate: the application is meant to modify the 'customer types', and therefore code that depends on customer types must be written in such a way that a no customer type value is considered existing forever.

I apologise for this post that goes against mainstream thought, but maybe there are some things that must be sacrifised so as that functional programming has wider acceptance. I recognize the many good things that FP has to offer, but I also see the huge gap in mentality between the imperative camp, which is the majority of programmers, and the FP ones. I think that for this gap to close, FP languages must make some sacrifises, and maybe free I/O could be one of them.

Order of execution.

As far as I know, all currently popular functional PLs care very much about execution order. "Functionalness" means that evaluation doesn't have side-effects, but being side-effect free doesn't automatically mean you can ignore evaluation order.

Moreover, classical Turing machines and equivalent formalisms are strictly deterministic and thus also dependent on proper ordering of evaluations.

Thus, if you want code that doesn't "think in terms of execution order", you need to break free from the classical deterministic computational model. This, in turn, leads you directly to non-deterministic Turing machines, NP problems and all sorts of other hoary subjects. This is a difficult and unexplored domain.

But back to the topic at hand: I/O is basically orthogonal to computation. Given a proper abstraction for doing both, a PL where computation and I/O exist in separate layers and don't infringe on each other is conceivable.

Turing, Computing and Communication

I/O is basically orthogonal to computation
Coming from pi calculus background I cannot agree. I (sometimes) tend to see computation as a special case of communication.

And yes, this is not my personal opinion, I basically borrowed it from Robin Milner. His short paper Turing, Computing and Communication1 is a quick introduction to ideas like that.


1I suspect this is essentially the text of his Turing award lecture "Elements of interaction", but having no ACM access I cannot confirm.

Tricky, that.

On the face of it, "communication" is not at all equivalent to "input/output".

Basically, there should be a way of doing I/O that does not involve "communication" in the CS sense.

No taxation without communication. Er, no I/O...

On the face of it, "communication" is not at all equivalent to "input/output".

What is I/O if not communication with the environment?

Basically, there should be a way of doing I/O that does not involve "communication" in the CS sense.

What makes you to believe there is such a way and why that would be benefitial? For a post I will pretend I understand what you mean by "communication" in the CS sense.

Well.

"Commnication" being what you meant, and not what's written in Webster's for that word. :)

Back to the topic, though -- I think I/O is a proper subset of "communication", precisely because I/O can be done in a stateless manner. (I think; though I don't think I'm ready to frame that in a formal manner.)

On further reflection:

Imagine a purely functional world with where exist only stateless functions and values.

Given a countable set of input signals, we can sctricly associate each input signal with a particular function; thus, we get input. Being able to represent any given particular value statelessly gives us output.

Moreover, what I outlined is not too far from reality: this is basically exactly how HTML and web apps work.

No, you're not supposed to get it

No, it doesn't make any sense. Do not reply to this poster; he is a troll. Ehud agrees.

Don't feed the trolls!

Stop flaming.

I read this that you have nothing constructive to contribute?

You're off-topic anyways.

They appear to be different.

They appear to be different. I have emailed your lanet.lv address regarding this.

I am not sure , but from what

I am not sure , but from what i understand persistent functional databases never loose any data because they archive the old data while they write the new one and old data can be retrived in constant time.



I work on relational databases and we use a technique called milestoning to exactly do this. This technique also provides a history of what has happened to that data.




This is the way a table is milestoned. To every table with a primary key, we add 2 new columns, from_date and to_date. Initially the from date is the date on which the data is inserted and to_date is a very large date , the default being "31Dec9999". This row is called the open or active row.




When this row has to be updated, the to_date will be set to the present date and a new row is inserted with modified data, from_date being the current day and to_date being the "31Dec9999".(this way the old data is persisted)


From now on, the primary key of this table will be extended by from_date and to_date.



To get all open rows just do
"select * from <table_name> where to_date = "31Dec9999".



This is an easy way to get history. Now to get the state of the row at a particular moment , all we need to do is keep this date between the from_date and to_date as in
" select * from <table_name> where your_date >= from_date and your_date<=to_date".



A more granular version is to use time in milliseconds to trace history.



If i got them right,I feel that functional persistent databases are only making this whole process internal, which ineffect can be added to the existing relational model very easily.



I feel that extending the SQL spec while creating table to take this milestone, will have the same effect.



But one problem with this model is that sometimes data hits the max database capacity, which is a limitation of the implementation but not the concept.

Implementation details

I agree on the fact that a temporal database can be build on top of a standard one (see Foundation for Future Database Systems: The Third Manifesto (2nd Edition)).

But have you ever build a standard RDMS? I think there is approximately 10000 man years of work that went into building the Oracle database for instance.
Also the correct implementation of ACID properties is very difficult to get right without losing to much performance.

What I'm suggesting is that by using (functional) persistent data-structures, you can get ACID properties almost for free.

However, I think confluent persistent data-structures cannot be build on top of a standard RDMS because this would mean that any version of a database can be 'merged' with any other version in O(1).

performance cost of milestoning in relational db applications

Hi 'mansu', In relation to the milestoning technique that you mention here wherein which the old data is archived / timedout, does it not degrade the performance of the underlying application considerably because of the underlying costs of the implementation. Were you able to get reasonable performance inspite of these costs and if so, how?

No, there is no performance

No, there is no performance degradation because, all you add are two more conditions to the where clause.

The lookup time does not change because, a milestoned table will have the milestoned columns as part of the index.

is there anything that says h

is there anything that says how access time and memory use scale with database size? or, more exactly, how that compares to a traditional implementation? and how quickly it starts from cold?

making some naive assumptions about how it's implemented i'd be worried that once the database exceeds the available memory this will run terribly since it will have to follow pointers from one part of a file to another. perhaps it's ok with only indices in memory (assuming you do sensible queries)?

i have no idea how databases work, so maybe this is true of traditional approaches too?

and something related to work - would this make it easier to synchronize two databases? i'm thinking of separate databases that, over the long term, need to track each other, but over the short term may have different local changes (obviously there's the possibility of conflicts).

Virtual memory/paging

Traditional databases can hold tera-bytes of information on disk with a small portion of it in main-memory. They use algorithms to swap the most interesting bits into memory - comparable to how virtual memory works.

It's a little bit more involved than virtual memory because the query optimizer will favor query plans based on a lower (estimated) number of swaps.
Query plans that involve lesser swaps (because all the pages that are hit by the query are already in main memory) have lower I/O costs.

Note that, in the case of a functional persistent database, there is no way that swapped-in pages become 'dirty' because everything is immutable.
So, there are no cache-coherence problems, something that makes traditional databases so hard to implement in a multi processor set-up.

Merging persistent functional databases could be very fast if there are no collisions. If there are collisions, some resolution conflict algorithm should be able to resolve them (three-way-merge, first come - first serve, etc).
It is however, not always possible to do it automatically.

If two databases are disconnected for a longer period, the chance of collisions will be higher (optimistic synchronization).
Synchronizing often would mean fewer conflicts but also less scalability or distribution possibilities (pessimistic synchronization). This is the default of your traditional database.

Comparing persistent functional databases with CVS or ClearCase show no real differences in theory - with the exception that CVS cannot merge or retrieve versions instantly.

thanks. i was thinking tha

thanks.

i was thinking that only the disk store was a functional data structure, and that an in-memory "traditional" database would be constructed from it, but now i realise that need not be so (since i'm aware of okasaki's work and functional data structures in general i'm not sure why i assumed otherwise).

"Milestoning"

Interesting. I have designed and written a database system that uses something similar to this, but I would suggest is actually superior.

My design also added 2 new fields, but integers rather than dates - "retrovert" and "version". The current version of a record has version 0. When a record is updated, a new record is created and the old data is copied into this new record. (It's important to do it this way round as we want to maintain the same ROWID pointing to the "current" record, e.g. by Queries.)

The retrovert field points to the previous version of the record, (i.e. the one you've just created to hold the old data).

It's important to realise that the retrovert number does not relate to this table, but rather to a separate table called "logbook" which logs the date, time, user, tables altered and type of change. If several related tables are altered simultaneously as part of the same change, they will all have the same "retrovert" number applied. This is necessary, because if the User needs to undo their changes I can then find all the records needing unwinding using the information in the "logbook" table.

It is possible for a user to undo their changes all the way back to the database creation, providing none of the record changes they are undoing have been modified by another user. I avoid the problem of merging changes (undos) by simply disallowing them once a change has been made by someone else. That way the user gets a completely safe Undo facility that although it isn't unlimited, at least is quite comprehensive.

The retrovert and version fields within a table act in a similar fashion to pointers in a linked list that chain all the alterations made to a record over its lifetime.

Furthermore, by querying the "logbook" it is possible to see what changes were made by who and when.

Even better, the bulk of this versioning is totally automatic and transparent by calling routines that interrogate the database metaschema containing the structure of the table which generate temporary buffers, fields, etc. on the fly in order to archive the changes.

It's written in Progress 4GL, but I would be surprised if something similar could not be developed in other Database systems such as PostgreSQL. (I woudn't know about Oracle, my one experience with Oracle was about 10 years ago and I was amazed at how primitive it was).

Re: Primitive Oracle

What, you thought the "boolean" type should be a primitive, built-in one? :-) It always flabberghasts me how something that is supposed to be about data integrity and understanding utterly falls flat in very simple ways.

Boolean

If I understand your remark correctly, you seem to be unhappy with Oracle/Sybase/etc not having the Boolean type.

The Boolean type usefulness for storing data in the database is not obvious.

Firstly, a [relational] database is supposed to store only *true* facts, every row is a true statement already, so why bother with Booleans ?

Secondly, it can easily be emulated by just about any other already available data type.

Thirdly, in order to enforce type checking Oracle (or DB2) offer user defined types. So if one feels that Boolean is truly necessary, one can easily define it.

Firstly, a [relational]

Firstly, a [relational] database is supposed to store only *true* facts, every row is a true statement already, so why bother with Booleans ?

Are you saying that a row that happens to have a flag that is false should not be stored in a database? Perhaps this confuses the issue of a tuple that is true or false with a value within a tuple that is true or false?

Secondly, it can easily be emulated by just about any other already available data type.

Well, every value ever invented in computing can be emulated with the S and K combinators, but it does happen to be rather inconvenient. The question is whether a boolean type is so fundamental a data type that it deserves first class citizenship.

Thirdly, in order to enforce type checking Oracle (or DB2) offer user defined types. So if one feels that Boolean is truly necessary, one can easily define it.

Ditto here, with the added emphasis that non-standardization of the boolean type means that it can be defined in any number of ways (as a char, a bit, a byte, an int, a value vs. null). Many complaints about PLs arise not from the idea that something is unpossible, but rather that it is not standardized. This means that every new program or library you write has to worry about possible differences in interpretation, and mixing non-standard libraries that might have encoded the boolean in different fashions.

Tuple


Are you saying that a row that happens to have a flag that is false should not be stored in a database? Perhaps this confuses the issue of a tuple that is true or false with a value within a tuple that is true or false?

A tuple models some real world fact, so what such Boolean tuple value is supposed to represent ?

Open or closed

I think you need it when working with an open world assumption. In a closed world assumption you can say "every statement that is not in the database is false". But with an open world assumption you don't know anything about statements that are not in the database. So you will have to store the knowledge that a statement is false as well.

CWA

The standard relational model relies on CWA, so the point is moot.


But with an open world assumption you don't know anything about statements that are not in the database. So you will have to store the knowledge that a statement is false as well.

Your statements are contradictory: if "you don't know anything about statements that are not in the database", then clearly storing "the knowledge that a statement is false" is not gonna help at all.

Meaning of boolean?

I'm not sure I follow the reasoning on not storing boolean values that have a false value within a database. To me, it sounds like you are stipulating that only valid facts should be stored - which I'd agree with. But some information is boolean in nature - and both the true value and false value can be valid facts (though not at the same time).

That, and I don't see why the concept of a boolean data type would be fundamentally different in the Relational Algebra than it is in other model of programming. Booleans are logic values, and I'd think that a relational database should be able to cope with them. Booleans can be interpreted as "Valid/Invalid", "Yes/No", "On/Off", "True/False", "0/1", etc... It's not the interpretion though that is standardized - it is the types of operations that can be performed on boolean operations (behavior) that makes them useful.

Boolean

Could you give a real-life example when using Boolean in the relational database would make sense ? More often than not (but not always) using various 'flags' indicates a serious flaw in the data model design.


That, and I don't see why the concept of a boolean data type would be fundamentally different in the Relational Algebra than it is in other model of programming

Because in other programming models, especially imperative, the type is often useful for describing the program flow control, although even there it's of limited utility. In the relational model, the very fact of storing a valid piece of information is sufficient by itself, without resorting to using Boolean.

Versioning


Interesting. I have designed and written a database system that uses something similar to this, but I would suggest is actually superior.

What you described is a very well known concurrency control mechanism called 'multiversioning'. Its description can be found for example in Bernstein's "Concurrency Control and Recovery in Database Systems", 1987. Multiversioning was implemented in a number of commercial and free database systems, some of them date as early as 1983 (Rdb).


(I woudn't know about Oracle, my one experience with Oracle was about 10 years ago and I was amazed at how primitive it was).

You might be even more amazed to know that Oracle implemented mutiversioning as early as in 1983, that is more than 20 years ago. Knowing that might have prevented one from reinventing the wheel.

Versioning

Sounds to me like you've misunderstood what I've done. I'm not talking about the ability to undo/redo/retry a sub-transaction which ability has obviously been supplied by any half-way decent RDBMS since the 1980s, but rather to record the evolution of records over time - who did what, when and why.

As for the comment I made about Oracle which seems to have raised your ire, I was thinking of 1996 when a particularly egregious example came to my attention. I couldn't understand why we had so many duplicated indices in the database whose only difference was whether or not they were unique. It turned out that at that time this was because if you specified an index was unique, Oracle wouldn't actually bother creating it, it merely created a uniqueness constraint instead. Never mind the database designer might have good reasons for saying that something needed indexing, and that the index needed to be unique - it knew best. (And what's more this behaviour happened silently without any warning). So to get around this idiocy, every unique index had to be created twice. And this was certainly the case, because I insisted on getting out the Oracle manuals because I thought it couldn't possibly be right. But it was - clearly documented at that time.

And whilst we are on the subject of indices, what's with indices that only allow you to travel in one direction? Want the ability to find the last record as well as the first without traversing the entire table? You have to create a second index with direction reversed. More indices.

Like I said, Primitive.

Of course I hope it's improved since then, but that was certainly the case back in 1996.

Versioning, indices


Sounds to me like you've misunderstood what I've done. I'm not talking about the ability to undo/redo/retry a sub-transaction which ability has obviously been supplied by any half-way decent RDBMS since the 1980s, but rather to record the evolution of records over time - who did what, when and why.

That's precisely what Oracle has been doing for more than 20 years -- recording the row evolution over time (as opposed to DBs like Sybase or DB2). The row change history was partially available via so called savepoints since at least 1983. Starting with version 9i (2000), one has complete access to the history and can, for example, run a query like 'select * from t1 as of timestamp '17:00:00 20-apr-2006' where the timestamp can be separated from the the current time by multiple commits. 'Who did what' has been available via auditing interface for many years too. As to 'why', I doubt that one can answer that by just analyzing the row history.


As for the comment I made about Oracle which seems to have raised your ire, I was thinking of 1996 when a particularly egregious example came to my attention. I couldn't understand why we had so many duplicated indices in the database whose only difference was whether or not they were unique. It turned out that at that time this was because if you specified an index was unique, Oracle wouldn't actually bother creating it, it merely created a uniqueness constraint instead.

This is simply not true. An excerpt from the Oracle version 7 manual (released in 1993, 13 years ago):


UNIQUE Key Constraints and Indexes
Oracle enforces unique integrity constraints with indexes. In Figure 7 - 4, Oracle enforces the UNIQUE key constraint by implicitly creating a unique index on the composite unique key.

Do you have a reference to an Oracle manual to substantiate your claim ?


And whilst we are on the subject of indices, what's with indices that only allow you to travel in one direction? Want the ability to find the last record as well as the first without traversing the entire table? You have to create a second index with direction reversed. More indices.

That is also not true. Oracle can (and could starting with version 7) traverse indices in both directions:


explain plan for select object_id from t1 order by object_id;
SELECT STATEMENT
INDEX FULL SCAN | T1_IDX

explain plan for select object_id from t1 order by object_id desc;
SELECT STATEMENT
INDEX FULL SCAN DESCENDING| T1_IDX

Oracle, etc.

Row evolution: I can record why because it's all under my control, but in any event it's interesting that Oracle can automatically access the evolution of the database - it's just gone up in my estimation. However I am able to link writes to related tables together (that are part of the same logical operation), so I can control things in a more fine-grained fashion than just knowing the time an operation took place.

Uniqueness constraints, etc.: I am afraid it most certainly is true - like I said I didn't believe it at the time and so I looked it up. Of course it may not have been Version 7, it was whatever version Mercedes Benz (UK) were using in 1996. Ditto for the one way indices - FIND PREV was not available. As you say that Oracle has only been able to traverse indices in both directions starting with version 7 perhaps we were working with version 6?

Unique constraints/indices

I do not remember version 6 behaviour, but I am quite sure about version 7. It behaves exactly as I described earlier. I also do not recall whether bidirectional index traversal was available before version 7.

Oracle Unique Indices

As I (obviously) don't have a copy of an Oracle database manual from a company I worked at 10 years ago, I have just done a quick google with the search terms "oracle unique index constraints".

The first entry is this.

This is Oracle themselves, and talking about version 10g, i.e. much more recent than 10 years ago:
"If you want a unique index in place, it is suggested you explicitly create it using CREATE UNIQUE INDEX. A primary key or unique constraint is not guaranteed to create a new index, nor is the index they create guaranteed to be a unique index. Therefore, if you desire a unique index to be created for query performance issues, you should explicitly create one."

Now that doesn't sound a million miles away from what our DBAs said back in 1996 - that if you defined an index as being unique there was no guarantee that Oracle would actually create such an index. So perhaps things haven't changed that much. (And in any event, I wasn't the one who created the database, I was just questioning why we had 2 copies of certain indices on so many of the tables).

Unique constraint/index


Now that doesn't sound a million miles away from what our DBAs said back in 1996 - that if you defined an index as being unique there was no guarantee that Oracle would actually create such an index.

You've misunderstood the reference. If you'd read it completely, you'd have realized that during a unique constraint creation there can be several possible scenarios:

1. If there is no index, Oracle will automatically create a unique index if the constraint is non-deferrable and a non-unique index if the constraint is deferrable;

2. If there is an existing index, either unique or non-unique, Oracle will use it as it is in order to enforce the unique constraint.

What the manual was trying to say is that if you want to ensure that the index is unique (for performance reasons), you'd better create it manually *before* creating the constraint. However, if you know how Oracle creates such index automatically (see 1), you needn't really bother.

In the case you've described, the DBA apparently did not quite know what he was doing as having just one index, either unique or non-unique, would have been enough to support the unique constraint.

Oracle

I'm at a disadvantage about the indices because it wasn't my responsibility, and it was 10 years ago. So you may well be right.

But I gave 2 examples of why Oracle seemed primitive to me. The other example of being able to traverse indices in both directions I did have to deal with, and is most certainly correct - and you've admitted as much yourself.

Progress has its faults. It's not the fastest and the first versions of its replication mechanism were diabolical, but I've been working with it since version 4 came out in 1988, and I can never remember a time when I was unable to traverse an index in both directions. To me that's pretty fundamental. And it's a pretty poor show when you have to wait until version 7 of a product before you can actually do this.