Excel as a different programming paradigm

(via Gamasutra via /.) Reminds me of the Subtexts of the world: a 3D graphics system in Excel, with programming rant. Pretty fun.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Also reminiscent of

Also reminiscent of Mathematica 6's dynamic interactivity feature.

Excel's programming paradigm

This paradigm is called "nonmonotonic dataflow programming" in this paradigm taxonomy. It is very similar to concurrent logic programming. There is one problem with this paradigm: it introduces "glitches": extra nondeterminism that is not part of the problem specification. An improved version of this paradigm that removes the extra nondeterminism is called Functional Reactive Programming (FRP). It would be nice if Excel could implement FRP (hint to the Excel developers!).

I'm assuming you refer to

the ability to create cyclic references in Excel spreadsheets, and how Excel attempts to resolve them (by repeatedly evaluating terms until something converges).

In many cases where Excel is used, this is a reasonable approach.

Excel's execution model

Actually, I was referring to the incremental stream calculation: changing the value of a cell triggers a recalculation. That's exactly like adding a value to a stream in a nonmonotonic dataflow program.

Converging on cyclic references is an optimization: when a stream calculates values that are identical (or within a small epsilon), the calculation stops.

The problem of nondeterminism still exists in Excel. For example, consider b=a+1, c=b*a. Changing cell a causes b and c to change. But if c is calculated before b then there is a glitch: c changes twice. Usually you don't see it because you're changing one cell at a time and at a human time scale the glitch is invisible. But if Excel would be used as a programming environment (e.g., a server) that communicates with other programs through streams of cell values, then you would see the glitches. FRP gets rid of these glitches and restores determinism.

How is changing the value of a cell...

any different than changing a manifest constant in a program and re-executing? Excel, of course, blurs the distinction between code and data, any cell can contain either.

I do see your point about cells "changing twice". As Excel does permit cyclic references, a topological sort of the cells is probably out of the question, so my understanding is Excel evaluates cells in some trivial order (tables, rows, columns or something like that). OTOH, it would seem to me that sanity can also be restored by adding some transactional notion on top--until a computation stabilizes (or convergence is decided to be impossible due to too many iterations occurring), the intermediate results (which can contain cells in an inconsistent state) are not visible outside.

bias

Using a "trivial order" and iteration to convergence (or indefinitely) is known to lead to biased results in simulations. Some implementers choose to eliminate the bias with pseudo-random ordering, e.g., Swarm. Others use systolic "frame based" data synchronization, e.g., PST Parallel Inference Machine.

Excel's cells are streams

How is changing the value of a cell any different than changing a manifest constant in a program and re-executing?

It's not really different, but the spreadsheet can do a lot more (e.g., remember old values, modify itself). A cell corresponds exactly to a stream of values (in the same way that "state" in CTM is defined as a "sequence of values"). Changing the cell's value corresponds to adding the new value to the stream. The stream viewpoint is nice because it fits with how Excel works: the Excel run-time system "waits" for changes in cell values. So cells are just input streams and Excel runs as a dataflow language!

adding some transactional notion on top--until a computation stabilizes

You've hit on the issue. Keeping changes invisible until everything stabilizes is a possible fix. That's exactly the technique used in the FRP in Oz example. FRP researchers have come up with more clever fixes, including topological sorts extended to handle cyclic references.

I dealt with this issue in

I dealt with this issue in SuperGlue by having changes propagate where the notified then re-compute their values using new information. In this model, there is no buffering, which would otherwise create a few glitches while the change propagations stabilize. In the loose world of UI, this isn't such a big deal, but the FRP community takes this much more seriously.

Buffering in general is more trouble than its worth. The only exception are multi-valued data, which have to handled specially (add/remove notifications don't fit into the "change" notification category).

top

My experiences with writing apps in Flapjax made me realize they are indeed a big deal for RIAs, but maybe you think that is sufficiently different from the loose world of UIs :) I'm still on the fence with graph reconstruction semantics for things like conditionals. You probably want synchrony per time-step that preserves event clock ticks and a notion of atomicity to prevent glitches within those ticks. Topological eval order is probably the most efficient way to achieve it for single processors (I haven't found it to be a terrible hit, and if you're serious about the implementation, adding JIT support to dynamic graphs for safe parallelism would probably take you far).

It's interesting to consider transactions for expressions within time steps as the implementation strategy, or even over multiple time steps. I've been thinking about this wrt optimistic concurrency for multi/manycore SMPs and even allowing imperative coding fragments. A lot of the target applications (and indeed original motivations) for this style of code are visual, so I've been a bit stuck on what fairness guarantees you want between different visual components. Synchronizing rendering between them seems like an odd task for developers as ideally rendering should be expressed just as a form of sampling a (temporally) continuous data structure. Instead of explicit synchronization, I've been thinking of nested constraint based prioritization, which means explicit resource scheduling can be done underneath.

I guess its how you define

I guess its how you define RIAs, which I'm not really happy with, as even gmail seems way too slow for me. Given my experience with UI (more animation-based), invalidate/refresh is a much better strategy than sending accurate deltas. With invalidate/refresh, we just need to track possible dependencies, we can be conservative, we don't have to do a lot of bookkeeping, and we especially don't have to deal with an explicit dependency graph.

Agreed buffering opens a big can of worms, but...can you provide some specific non-toyish examples of where glitch-free buffering is actually needed in a RIA (assuming the database isn't a part of the FRP code)?

But does it? Depends on how

But does it? Depends on how Excel is implemented. A topological sort of the cell dependencies solves this problem. Kenny Tilton's "Cells" system will always compute b before c. So the cells are not like asynchronous channels since they have a deterministic ordering.

What of cycles?

How are cycles handled? Are they banned, handled with iteration-until-convergence, etc?

Check out QCalc

Some here might find Albert Graef's QCalc written in the functional programming language Q, interesting. Although not really 3D, you can stick GUI elements, gnuplots, mplayer files, or any function you can create in Q (the entire language is available) in cells in an interactive way. Since Q has threading libraries, I'm pretty sure it would be possible to network individual cells in one spreadsheet to various cells in other spreadsheets over a network. Unfortunately, only *n*x or MAC users can run QCalc right now.