Functional Relational Programming: Out of the tar pit

In a similar vein to John Carter's recent submission, here's an attempt to attack the complexity problem: Out of the tar pit

Abstract: Complexity is the single major difficulty in the successful development of large-scale software systems. Following Brooks we distinguish accidental from essential difficuly but disagree with his premise that most complexity remaining in contemporary systems is essential. We identify common causes of complexity and discuss general approaches which can be taken to eliminate them where they are accidental in nature. To make things more concrete we then give an outline for a potential complexity-minimizing approach based on functional programming and Codd’s relational model of data.

They basically advocate minimising mutable state, moving any mutable state that remains into relations, and specifying the behaviour (i.e. manipulation of relational data) in a declarative/logical/purely-functional language.

Pros: the first half of the paper is a reasonable review of the types and causes of complexity in software.

Cons: lack of any implementation, other than a Scheme prototype of the relational bits, I think (see footnote 25). No source code. Lack of detail about interfacing with external systems.

Here's a link to Ben Moseley's FRP page, where you can find the paper, plus presentation slides and a discussion group.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I am not entirely persuaded that state manipulation is 100% bad.

I am not entirely persuaded that state manipulation is 100% bad. Manipulating state is necessary in many tasks for performance and convenience reasons. What is bad is that current mainstream programming languages do not offer the means to abstract state manipulation.

I think the ideal programming language should offer layers of abstraction:

1) In the bottom layer, a C-like programming environment where state/control flow is manually programmed.
2) In the middle layer, an object-oriented environment that allows separation of concerns.
3) In the top layer, a declarative environment that allows the programmer to declare the application as a set of relationships between data sets (as the paper says), including functions.

I think that the current situation is very close to the above, but not in a planned way: the various programming languages that exist offer the functionality of one or two layers, but not all. For example, one can use ML or Haskell to code an application's logic, then use C for the really "difficult" tasks.

Ultimately, though, using a mixture of programming languages does not help towards less complexity. The complexity is moved from the language to the development environment.

State

State manipulation is not inherently bad -- rather, limiting your state manipulation to a certain scope of objects for any given task is good. Functional programming makes this nice because you know the only object modified is the one you're returning to the caller :)

Of course, some applications (say, emulating a CPU) are all about state manipulation and would not benefit at all (well, much) from a functional language.

The Headache of Stateful Concurrency

Not only that, but by avoiding the use of state when you can, it is quite easy to avoid a lot of headaches associated with concurrency. Without state, a calculation no longer needs to lock resources to avoid nondeterminism-based errors, nor does it need to perform hairy syncronization to keep stateful objects updated. So, without state, you can just tell a calculation to go off and compute itself somewhere and come back with the answer when it's done, without all the headaches (that I, at least associate) with stateful concurrency.

Now, that said, you still need some state behind the scenes to keep everything working, such as futures for passing evaluation results. There's a trick to doing statefulness with concurrency *right* that tends to scare people away.

Trick or treat?

Is this a trick or some insightful way of dealing with concurrent state?
Because I tend to see futures like lazy evaluations, but with a not so fancy dress on.

The nice thing with state is that it can be freely copied and distributed if it was immutable.

Still, if you want to change immutable state, make sure you refer to it with a new unique name while keeping the old state safe. Otherwise you're in big trouble.

Sharing immutable state is easy. Concurrently sharing (unique) references even more so.

Current databases/applications achieve shared state a-priori by introducing some kind of lock mechanism. But why use locks? Because they allow the illegal re-use of the same reference for different states in a system (=objects).

To achieve maximum concurrency, locked state and re-used names should be completely banned in a distributed setup.

This rule will eventually give us the scaleability everyone wants - its the referential transparency rule!

question

What if, for example, a specific field of a record must be updated and two users have opened the update form? the database will end up in having two records for the same thing, whereas one was expected.

Not quite how that works....

Your problem is where "one was expected."

Consider the problem in the context of a bug database, just to make it concrete. Each bug is numbered, and your two users are both submitting updates to the same numbered bug.

The approach advocated above is to let both users submit their updates, and let both those updates live in the database as separate records. Later, when some third party asks for the "state" of the bug, the database constructs it from all the updates it knows about. Instead of having just one record for each bug, the database has an open-ended set of records for updates, and the key field for updates is the bug number. Each "bug" may comprise hundreds of records of updates, allowing people to get older versions of the bug just as easily.

There is one drawback to this model aside from its inherent memory-hoggery, and that is that it tightly limits the use of any non-composable operations. For example, editing a general "notes" field. Important discoveries submitted simultaneously, even if they both get recorded, may not both get into the next version of the bug if the recording actions are noncomposable.

For example, Alice, who has just discovered that the DLL they've been suspicious of isn't an issue, edits out the note that points at the DLL. Simultaneously, Bob, who has tested browser compatibility, edits out the note that points at IE8. Now Carol comes along and asks for the state of the bug. Which version of the "notes" field does she see? The one Bob submitted, which still mentions the DLL, or the one Alice submitted, which still mentions IE8? Neither is entirely correct.

In order to make the model where all updates live in separate records in the database (so there is no locking) work, you have to pretty much eliminate any freeform text fields that can be edited, introducing a somewhat more restrictive procedure where you can "add" a note or "delete" a note (because both additions and deletions are composable) but cannot "edit" a general notes field (because editing is not composable).

Ray

In order to make the model

In order to make the model where all updates live in separate records in the database (so there is no locking) work, you have to pretty much eliminate any freeform text fields that can be edited

Or you add composition to free-form text editing via merge algorithms, and push conflict resolution to one of the users when necessary.

Generalizing this approach of fork-join-merge you get Microsoft Research's concurrent revisions.

Text merge algorithms are a problem.

You can try. But at this point we don't really have software that understands human language well enough for that to really be a reliable approach. It's a neat party trick, and works often enough to be convenient, but it's just not something to trust with mission-critical information.

Also, users (even engineers) will rapidly learn a "default action" for merging that they can do without thinking, and that will make them effectively useless for occasional conflict resolution.

Text merge as cooperative work

Sandro was suggesting the text merge for human language be performed by a human ("push conflict resolution to one of the users when necessary"). This actually works quite well.

Conflict avoidance also improves the experience - i.e. highlighting the regions that other people are concurrently editing, and providing a chat/comments/annotations channel between editors.

Also, while a default merge action is likely, you cannot conclude it becomes useless (and, like git, you could force developers to handle edits that lack a high-confidence merge). You'll just need to provide history and undo, too.

Concurrency and State etc

There are obviously many different approaches to concurrency that improve on the commonly-used monitor-based systems - Erlang, Oz, the various process calculi-based approaches, and even the STM stuff in Haskell.

I guess our question is - how often is concurrency really an _essential_ part of the problem the users need solved?

...and of course we ask the same question about state - how often is it really an _essential_ part of the problem?

...if we can succeed in separating out the aspects of the problem which are accidental, then I think that would bode well. In my experience it is the _accidental_ complexity which kills really large systems...

Amen

It will be a great day indeed when the default approaches to basic architectural concerns in programming languages, such as mutation and concurrency, aren't, by default, the most complex to reason about, and therefore to use correctly, available! Peter Van Roy and Tim Sweeney have already covered what the alternatives might look like at length, so I won't rehash them here (and there was much rejoicing).

Lock-free concurrency

It isn't really all that hard to make concurrency lock-free. There's a pattern I use that has made it fairly painless for me.

I just have threads halt while they have any child threads running. The child threads can read data allocated by the parent thread but treat it as immutable (which it effectively is since neither they nor their siblings will mutate it, and the parent thread isn't running while any child threads run). But each child thread can have its own data, which of course it can mutate without fear because it's the only process that knows about it. UNTIL it spawns its own child threads, whereupon it halts until _they_ all complete, and they treat the data of _their_ parent thread as immutable....

In each thread, when preparing to launch child threads I allocate a buffer for child return values; each location in that buffer is available to exactly one of the child threads for reading and mutation, and may be used to store a result (or a pointer) for the parent to "inherit" after the child thread has run.

Anyway, it's fairly painless.

Ray

Export

That sounds similar to the way Unix handles process level concurrency. Processes can read, but not modify environment variables set by the calling processes.

MVC?

So I read that paper, going yes, yes, yes, I agree with that, yup, yes, yes....

When I got to the end I said MVC.

Admittedly a somewhat purer form thereof, but MVC nonetheless.

new link

Thanks

Though I must admit that, as interesting that it is I find it a bit shallow:
- no implementation nor implementation idea

- apparently the external world is seen as static, what is to be done when there's a state change in the external world?
[ this may be caused by me reading too fast the paper ]

- apparently no progress since 2006

Implementation in progress...

I'm starting on a toy implementation that will hopefully become more than a toy someday and have some *very* basic questions that I've posted on the Google group. Maybe posting here will awaken the sleeping giant. :)

Source Code finally

For everyone interested the source is available now:
link