guy steele on how (not) to think of parallel programming

The first segment of the talk has some tasty anecdotes to explain how far we've come.

The middle segment introduces some of the problems Steele has been wrestling with regarding parallel programming, and how best to describe (for human and machine consumption) how to think about it. The "accumulator paradigm" stands out as a particular hint of something gone wrong for the parallel programmer. There are some interesting quips in here about things like "why didn't Fortress start from Haskell" and "Say, just how important are associative operators, anyway?"

The finale segment shows off some Fortress niceness.

The parting words are interesting.

The talk is presented here with video and slides..

If I may speculate or editorialize or otherwise just make my own notes here, briefly, if for no other reason than to encourage you to watch the video and figure out why I'm wrong:

I note the prominent "Oracle" brand on the slides and, free associating, it reminds of something related:

Stonebraker (perhaps someone might offer some links) had some recently semi-infamous criticisms of so-called no-SQL databases. His recent work, last I heard, was on SQL solutions for column-oriented stores of very huge data sets. He is keen on stuff like like ACID properties. He sees (my weak paraphrase, not his words) stuff like map-reduce hype as a disappointing giant step backwards in the public discourse.

Oracle, among other things, is in the database business and I would guess is concerned about the kinds of issues that Stonebraker talks about.

Codd wanted, in his abstract formulations, a practical separation between a logical model of data and the physical realization of operations over this, plus transactional properties, plus a "relational" foundation that was so abstract in conception that SQL is but an example, not a final word. Indeed, SQL is horribly dated for its failure to be cleanly integrated with a surrounding and more general programming language. The way it is used in various web frameworks is generally disappointing. Personally, I would guess it is that awkward integration that, until recently, fed a lot of no-sql hype.

The Fortress project has landed in a potentially great place[*] for it. Steele's notions of parallel programming discussed in the talk, married to big data.... Perhaps something interesting will come of it.

[*]: Although I do not think it is good if Steele's work going forward can not respect and hopefully advance the cause of software freedom. At Sun, the Fortress project had a very good chance of doing so, as far as I can tell. This matters for language theory academics and others in a forum like this because, presumably, we are not here to kibbitz for the purpose of providing free insight to a company that will use that insight to harm the freedom of ourselves and others.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

MapReduce and Parallel DBMSs: Friends or Foes?

Stonebraker (perhaps someone might offer some links) had some recently semi-infamous criticisms of so-called no-SQL databases.

Perhaps this one ? MapReduce and Parallel DBMSs: Friends or Foes? [cacm]

This older presentation

This older presentation (ICFP 2009) contains much of the same material as the second half of the InfoQ talk, without the ancient history anecdotes and probably with a bit more depth. I watched this one first, and noticed the repetition when watching the InfoQ one. Frankly, I preferred this older one; the archeology was only mildly interesting.

Previously on LtU.

This is about education: selling Fortress & higher abstraction!

What Steele is doing here is to try to get the 99.999% of the audience out there to change. This talk challenges some of our old and primitive beliefs and makes sure that we do not reject what is said. Therefore the introduction and the way the presentation is developed and concluded.

Another way to look at this is to say this is marketing: Fortress has a high chance of being rejected as the next Fortran unless the applied math developers start to worry. This talk is meant to make them worry but while giving them enough that they do not reject the message. It is also marketing in that it presents Fortress also as a solution for non-numerical problems. And do not get me wrong, this is positive marketing!

Having said that, the path of "reconversion" to higher abstraction is hard and long, and there is still a real chance of failure to convert the masses to higher abstraction. Therefore I would say: great for Oracle to "sponsor" this!

implementation, implementation, implementation

Steele's points deserve to be taken to heart much more in the development of languages and language constructs targeted at parallelism and concurrency. I also really respect his focus on numerical computing, which I think has been neglected in contemporary language development to some extent.

At the same time, I'm really disappointed in how slowly the implementation of Fortress is moving. This is not a criticism--I can only imagine the issues involved given the Sun-Oracle transition, as well as other things--but it has effects nevertheless. It's difficult to create use when there isn't a practically useful instantiation of concepts. I keep crossing my fingers that something will materialize out of nowhere in the not too distant future, but I'm not holding my breath anymore.

Accumulator

I think the stuff about the accumulator is the most interesting part of the whole talk. I have thought about how this relates to my language Babel-17, because Babel-17 has built-in support for (currently only sequential) accumulators. It seems that it should be easy to take Babel-17 into a direction where the accumulator is the only part that needs to be changed to switch between sequential accumulation and map-reduce-style accumulation. For a little bit more detailed exposition of this see my blog.

Recommend this for the home page.

A great presentation - deserving of a link from the home page.

Any editors concur?

Eliminate the Accumulator?

Guy Steele presented an argument to eliminate accumulators to benefit parallel programming, but I am not convinced it is completely required. The properties of associativity and commutativity allow us to optimize loops such as SUM=0; DO i=1 to 1000000 SUM=X(i); although, Sum(X) is easier to code and optimize--we should use it.

However, dividing the loop into a binary tree of operations did not eliminate the accumulator, it created one for every binary operation (half a million to add the leaves), reduced by half after one addition cycle, reduced again by half in the next cycle, etc, until one accumulator survived.

In fact, Steele said that doing operations in parallel often requires more temporary memory (e.g., more accumulators) than linear loops. An accumulator in an arithmetic unit is temporary storage, As computers have become more complex, designers have accumulators for each arithmetic unit plus several layers of internal cache as temporaries.

Thus, Steele's call to eliminate accumulators meant for language designers to prevent use of accumulators, to make them less beneficial to programmers, or at least make them less attractive to use. Unfortunately, human psyche may rebel at the idea. In my head is an age accumulator, and every morning it clicks "another day." I don't actually know the number of days, but that accumulator grows each day.

Moreover, spreadsheets depend on the idea of many accumulators (cells), and they are probably the most popular programming tools ever invented. In fact, there is already a patent application for a parallel execution mechanism for spreadsheets.

Rather than eliminating the accumulator completely, perhaps we should include spreadsheet semantics in languages to help with parallel computing It is only necessary to make a spreadsheet object that contains, among others, cell objects, and the ability to evaluate expressions within and among the cells.

I believe there are many simple expressions such as SUM(X) that can be evaluated in parallel. In addition, I believe there are also problem sets that cannot be reduced to a simple equation, and they will require many separate expressions, which may also be solved in parallel (e.g., things done in spreadsheets, or the stress model of an airframe).

Reactive spreadsheet-like

Reactive spreadsheet-like semantics are becoming more popular, perhaps due to WPF and JavaFX and publish/subscribe. Leo Meyerovich (Flapjax) Sean McDirmid (Bling, SuperGlue), Conal Elliott (functional reactive programming), and myself (reactive demand programming) are at least a few LtUers who have been developing systems based upon reactive models that propagate data.

But, while these models do expose opportunity for parallelism, taking effective advantage of those opportunities is not trivial. Similar to pure functional programming, the challenges are of granularity (vs. overheads), and of dependencies between elements. In a reactive model, dependencies create issue for parallelism for two reasons: First, evaluating updates in the wrong order means doing a lot of unnecessary rework. Second, it is preferable to have consistency properties - where all references to a dependency (even indirect references, through other 'cells') are treated as sharing the same value at any given observation.

Automated parallelization is feasible, but takes a sufficiently smart optimizer and access to the dependency graph. The former is always a problem, and the latter could be a problem if the dependencies are dynamic or modular or if the reactive system is provided via library.

Note: you should fix the patent link.

Thanks, fixed & ...

Non-trivial, yes. That's why the best are working to understand and publish about computing with cores and clouds. I suspect the future will bring a both plethora of parallel programmer niches and technology to code-serial and run-parallel. The code-serial technology is suited to the kinds of problems that run on SIMDE computers, but not limited to SIMDE. But, full coverage will probably require languages made for niches.

Aqueuemulators

Queues are implicit in many concurrency paradigms (actors message passing, streaming, flow-based programming). I consider them one of the greater culprits, hindering our ability to leverage concurrency as an opportunity for scalable parallelism.

Queues are used to regulate access to shared resources, such as state or access to a printer. With the printer, there is no getting around some sort of serialization. But there are often ways to avoid use of internal state.

The worst problem with queues is that they require use of stateful idioms in order to solve simple problems. If you use a queue, you'll also need an explicit accumulator in order to keep information about past messages, correlate ongoing behaviors, fuse data from multiple sources. This introduces a Catch-22 situation: you use the queue to protect the stateful idioms, and you use stateful idioms because you are being fed by a queue. To break out of this scenario requires reconsidering communication as a whole - how side-effects are expressed.

Eventual consistency is the right basic idea, IMO: require that the side-effects be commutative and idempotent, like storing information into a data set. Logical monotonicity is a powerful basis for side-effects, and non-monotonic behavior can be expressed as occurring over monotonic time through temporal semantics. Options other than Bloom with logical monotonicity include use of single-assignment variables (declarative concurrency), temporal concurrent constraint programming, and some forms of reactive programming.

cold and hot on e.c.

my small layman readings and dabblings have led me to be mostly frustrated with how i see eventual consistency leading to bad user experience, and how i see development teams adopting it but not apparently paying attention to the nuances which are required to really make it give correct answers. it currently feels to me like something that only really smart and careful and motivated people should be trying to use. especially since the tools around things like nosql dbs are so lacking afaict, at least vs. the sql ecosystem.

but if somebody came along and managed to create a system that helped to enforce all the things that need to be enforced in order to have e.c. turn out well, and a system that helps highlight what sore thumbs need to be addressed, then that would be like gangbusters.

i see development teams

i see development teams adopting [eventual consistency] but not apparently paying attention to the nuances which are required to really make it give correct answers

Ditto for multi-threading and locks.

Eventual consistency, from a language perspective, means a major change in how side-effects are expressed. Correspondingly, there will be new idioms and patterns for its effective use.

I suspect a lot of people are 'inspired' by e.c. but manage to totally misunderstand it, similar to being 'inspired' by SOA but think it's just adding an XML + socket API to their application. I agree that better tooling would help - both for explaining the idea and for getting it right.

i see eventual consistency leading to bad user experience

Eventual consistency, applied to UI, should mean you can see your own updates with good responsiveness (after all, one reason to favor EC is all about availability and partitioning tolerance) but your view may be 'eventually' corrected to accommodate concurrent updates by other agents, and everyone's views will become consistent (eventually) if updates halt. Under-the-hood, some sort of replicated database or replicated stateful objects are likely, otherwise you'll have nothing to call 'consistent'. However, the degree of replication may vary.

Spreadsheets and scenegraphs are well aligned with eventual consistency, as are multi-player online games (where views of different players must be mostly consistent, but some slippage is allowed for variable lag).

This sounds related to the

This sounds related to the damage/repair idiom I've become addicted to for all my reactive/live programming projects. Inspired by iterative data-flow analysis, you represent the state and computation of a program as a bunch of dynamically interdependent nodes. Nodes can come and go, and can be organized in hierarchies, though this typically a special dependency. When a node changes in state, all of its dependent nodes are reprocessed until the change works its way completely through the system. Nodes memoize their previous state so they can tell when they've changed.

The system is basically inconsistent but still active while a change is being processed, there are no transactions or atomic updates.

Weak forms of Consistency

related to the damage/repair idiom I've become addicted to [...] When a node changes in state, all of its dependent nodes are reprocessed until the change works its way completely through the system.

I too use a damage/repair idiom, though currently without memoization (I just track time of change, since I don't assume Eq class). But I feel it is important to also ensure a few weaker forms of consistency to simplify reasoning about the program:

Limited snapshot consistency: Within an instant, all views by one node of any other node are consistent. However, this is not full consistency because it is not transitive: if Alice observes Bob and Charlie, and Bob also observes Charlie, then Alice's view of Charlie might be from a different time than Alice's view of Bob's view of Charlie. This form of consistency is just enough that developers can reason locally about glitches (i.e. glitch-freedom at the reference level).

Spatially bounded consistency: A 'node' in my implementation is a vat of agents, rather than an individual agent. Snapshot consistency applies for all the agents in a vat, which alleviates pressure to use monolithic agents just to increase consistency. Anyhow, for the 'spatial' consistency I also guarantee that all agents internal to the vat have reached 'eventual consistency' before external vats may observe the state of any agents within the vat. That is, vats don't see the intermediate states of other vats. This makes macro subprogram-level composition much easier to reason about.

Temporally bounded consistency: There is an upper bound on how long (in logical time) it will take to receive an update from another vat in the same process. This allows reasoning about eventual consistency within the process. The total time to reach eventual consistency in the whole process is still potentially unbounded (since it may require an unbounded number of inter-vat updates, e.g. for cyclic relationships), but if you arrange the program such that you can reason about number of cross-vat hops then you can reason about time to achieve consistency. With some discipline across vats you can use these temporal bounds to achieve determinism (albeit, at some cost to latency or parallelism). (This upper bound is configurable per vat.)

I don't ensure eventual consistency at the implementation layer. I assume it. Ensuring it is the job of static analysis of code or reasoning by developers (similar to protecting against infinite loops in other languages).

Anyhow, I use atomic updates under the hood to achieve the aforementioned consistency and parallelism. But I do so at a remarkably coarse granularity. Updates to remote vats accumulate in local queues, and at the end of each instant (after establishing consistency locally) there is one atomic operation per updated remote vat that delivers a whole queue of updates - potentially updating dozens of bindings. Incoming updates are only processed once at the beginning of each instant, and all updates are processed before any reactions. Basically, each 'instant' gets an atomic view of everything outside the vat, at cost of very few atomic operations overall. (Within the vat, I similarly have multiple 'phases' per instant in order to propagate updates breadth-first and establish internal consistency.)

there are no transactions or atomic updates

That's only impressive if you get some sort of parallelism. How much parallelism do you currently get?

That's only impressive if

That's only impressive if you get some sort of parallelism. How much parallelism do you currently get?

This is purely UI programming, so its the UI that becomes inconsistent without transactions or atomic updates. I haven't done anything with parallelism yet since it doesn't match this domain, but I definitely think its feasible:

Have multiple threads repair nodes, don't worry about locking the state down of each node, just make sure your dynamic dependency tracking (very small grained) is safe. If a node reads the state of a node that then changes during repair, it becomes dirty and we try again.

Now let's say you have lots of nodes and you don't know statically how they are related or even what order they should be processed. You just run them on random repair threads, retry nodes that were processed too early, and eventually everything works itself out. The efficiency of such a careless approach is a good question, I'm sure it could be really bad if the nodes are heavily dependent and a more systematic parallelization/ordering is merited (the answer will still be right, but you will suffer a performance penalty). However, the approach itself is very easy to apply, which is not the case with most other parallization technology.

Three stooges Bling?

Without parallelism, your UI should at least be inconsistent in a deterministic way, which is nice for verification and validation. That wouldn't hold once you introduce parallelism, so you might end up favoring some weak guarantees about the flavor of your inconsistency.

Efficiency is a huge issue. Propagating updates in a poorly chosen order can be algorithmically more expensive, even assuming a nice DAG shape in the bindings.

My use of breadth-first propagation is far from optimal. For example, assume the relationship graph (with A→B meaning B depends on A): A→B, B→C, C→D, A→D, D→E, E→F. Assuming a single update to A, the ideal update order is B,C,D,E,F. However, the breadth-first update is: B,D; C,E; D,F; E; F, which is 60% worse than optimal.

(For reactive demand programming, the scheduling problem is difficult due to the dynamic and bi-directional nature of each relationship. I don't expect to really tackle this problem until I'm past bootstrapping a compiler and runtime for an RDP-based language. BFS, for now, has the advantage of keeping it simple. Isolating updates internal to each vat also mitigates epileptic update storms.)

Anyhow, my expectation is that a bunch of extra threads making random repair walks won't much improve productive throughput. In fact, the opposite may occur, with threads competing to update objects or convoying updates. Three stooges stuffing a turkey comes to mind.

Bling doesn't do this, it

Bling doesn't do this, it relies on WPF's dependency tracking, which is fairly simple and non-concurrent. I've tried node repair for a compiler, but for being incremental, not for concurrency. But say you have a large project that you are compiling, and you want to distribute it across 16 cores. For a language like Scala, you can't compile different files very easily in parallel, there are just too many ways the files could depend on each other via the type system. And the Scala compiler is slow enough that users could get a real benefit from some parallelism.

When you discover a dependency with another node that has not been processed yet, you can process the node in the thread that has the dependency and take it out of the other thread's queue. This approach falls down for symbol table accesses, so you just record a dependency for in the symbol table, and when another node is processed that modifies the symbol table in a relevant way, you re-process the dependent node. As an optimization, you can suspend node processing when they encounter a symbol table entry that doesn't have binding yet, and take a redundant computation hit if there is shadowing.

Now take a large compilation job and process source files (or some other large grained tree node) in 16 concurrent threads. Since the files aren't really independent, there will be some suspending and reprocessing, and you take a hit from recording dependencies dynamically as well as being able to detect when node state has changed via a re-processing. But I think one should be able to eek out a 4 or 8X speed up, which isn't bad. This could be done without drastic changes to the compiler.

I would love to do this experiment, but the compilers I work with today wouldn't really benefit from it. A C# compiler is crazy fast, but if I was still doing Scala work, this would be a nice project.

Compilers are not "stupidly parallel," so if you coud get a performance boost there, then you could probably get some boosts in other domains.

Parallel compilation

Parallel compilation generally can operate under the assumption that there are no new changes while you're still processing the recent updates. This is a nice simplification that avoids some of the worst behaviors.

A common scenario in a reactive system is that several related nodes update in close proximity (but not at the exact same time), or a single node is updated several times very rapidly. If threads are performing random repair walks, it won't be uncommon that the only available updates are all centralized in some tiny little niche in the dependency graph.

You'd need synchronization just to keep old updates from overwriting new ones, and don't want a bunch of threads wasting their time convoying updates down a local chain of bindings when some change arrives somewhere else.

That said, there are dynamic solutions that could make this work well enough. You could probably use a history of update collisions to create single-threaded 'update territories' - a subset of nodes where a high rate of collisions are seen. Only one thread would be in an update territory at a time, except to occasionally challenge them. This would be more analogous to predator territories in nature.

I'm not mixing parallel and

I'm not mixing parallel and incremental compilation together yet. They do depend on the same infrastructure and it might be possible to do both at the same time, but if you are incremental, do you really need to be parallel also? Let's consider parallel in the case where the program isn't changing, you just might process nodes in the wrong order, requiring re-processing to get them in the correct state. In this case, there isn't a lot that we don't know about the right order, just some holes related to dependencies that occur indirectly through the symbol table.

Compilation is a much easier problem in general because when you are inconsistent, your nodes are often in an error state and you can simply suspend all nodes that depend on erroneous nodes.

Leveraging coherence in mutations is a great idea and I've thought about it some for reactive incremental processing. But it doesn't seem to make sense for compilation, where the nodes are rather large grained and most of the dependencies are obvious (well, the nodes do form a tree!). Also, we are only talking about basic SMP, not NUMA, with 16 concurrent threads. If your changes are small and often, they generally work their way through the repair processing rather quickly, you don't need to do anything very fancy.

Incremental?

I'm a bit confused where incremental came up. My example user story for 'incremental' is: Assume I'm performing an inner join between sets S and A. Then I add an element e to S, and wish to reactively propagate the update. Rather than freshly recomputing the join using the new value S' = S∪{e}, I want the compiler to recognize this case and perform just the additional join test between {e} and A.

I'm assuming you mean something different by it. What is your meaning?

[edit] I suppose it's a relative term. IIRC, you mentioned a while back that Bling involves a whole-world recompute each time step... which is less incremental than propagating updates to specific bindings, but more incremental than recomputing the world from its entire history of updates.

My language design goal has always been Internet-scale and zoomable user-interface. Recomputing the world for a small update was never on the table.

if you are incremental, do you really need to be parallel also?

To my understanding, certainly, yes, we want to support both optimizations of incremental computation and utilize our vast sea of compute resources. This applies to regular evaluation, and I don't see why it wouldn't also apply to compiling code for a live program.

Compilation is a much easier problem in general because when you are inconsistent, your nodes are often in an error state and you can simply suspend all nodes that depend on erroneous nodes.

The vision I have for live programming is that the old code will continue to be used until the new code both compiles and passes all its tests. (If it does both, it is deployed immediately. If you have a problem with that, write better tests. If necessary, include human approval processes among the tests.)

Sorry, I'm attacking two

Sorry, I'm attacking two different problems with the same infrastructure, and I'm trying to simplify the discussion by not talking about both at once. Just to speed up perf via parallelization, you don't consider normal changes to S, only updates to S that are revealed because you have more information. If we are talking about live programming, which I consider to basically require incremental, then S could change by being edited.

To my understanding, certainly, yes, we want to support both optimizations of incremental computation and utilize our vast sea of compute resources. This applies to regular evaluation, and I don't see why it wouldn't also apply to compiling code for a live program.

In both cases, if you are making sustained small changes to a program via live programming, then each change should require very little time to begin with, so multi-CPU parallelization isn't going to provide a noticeable benefit. GPU parallelization is still profitable for UI, but that is a completely different domain.

The vision I have for live programming is that the old code will continue to be used until the new code both compiles and passes all its tests. (If it does both, it is deployed immediately. If you have a problem with that, write better tests. If necessary, include human approval processes among the tests.)

Your vision is similar to mine except for the test gateway part. But that is a debate for another day :).

if you are making sustained

if you are making sustained small changes to a program via live programming, then each change should require very little time to begin with

If the implementation is inlining functions, performing runtime specialization based on tracing and hotspots and so on, there might be a lot to update after certain changes. An edit to a DSL subprogram might require reconfiguring a whole framework. The size of the edit seems to matter less than the number of dependencies.

Anyhow, I would agree that Bling probably doesn't need both parallelization + incremental updates. But that's due to Bling's limited scope. For example, your comment above makes the implicit assumption that all updates come from a single user.

When I talk about live programming or reactive programming, I'm not limiting myself to the assumptions made for Bling. Apologies, if this is a blow to your ego. ;)

Your vision is similar to mine except for the test gateway part.

I'm curious: Do you have a different vision for the test gateway, or do you not envision one at all? (Naturally, the easiest way to pass the test gateway is to not write any tests.)

Bling has no live

Bling has no live programming story, and its not very reactive, it is simply a wrapper around a few .NET technologies. The more relevant project here would be SuperGlue, which actually did live programming, as well as some of my older Scala IDE work. Definitely limited scope.

If you are talking about a small changing having huge effects, there is less point of being interactive (and hence incremental) since anything that requires parallelization (beyond the GPU-kind) is going to take some time to complete anyways. This is why I see the two problems as being detached.

Let's take large scale matrix computations, like building a model for speech recognition. That computation is going to take a few days to run, so it definitely has to be parallelized. Interactive? It doesn't really make sense, any non-trivial change will probably require that we rebuild the model again from scratch.

Most PL people (including Guy) who talk about parallelization have very limited domains in mind, while the non-PL people who actually do this kind of thing would (and in my case do) laugh at our assumptions for their specific domain that we didn't consider (e.g., speech recognition). So pick your domains carefully, and be prepared to be wrong.

I don't see test gateways as being very viable. Continuous testing sounds nice, but the tests will always be fairly incomplete, they often lie about the integrity of your program. Rather, live programming should be about building a program while it is running and testing it a bit at a time both manually and automatically. You should have to press a button, at least, to go live.

A self-fulfilling prophecy

A self-fulfilling prophecy seems to be embedded in your assumptions, there.

If you pick your domains carefully and decide based on your domain choices to (for example) not simultaneously support incremental and parallelization, then of course your IDE is going to be bad at compiling something that needs heavy parallelization.

OTOH, if your reactive infrastructure was developed for general purpose programming (in an age where CMP and SIMD processing are both popular), you'd already support for various forms of parallelization, and there'd be little sense not taking advantage of them in your compiler. So you'd handle the speech recognition model without any special difficulty. It might not be the most optimal compiler for a speech model, but it won't be less optimal than other general purpose compilers.

Continuous testing sounds nice, but the tests will always be fairly incomplete, they often lie about the integrity of your program.

You certainly shouldn't depend only on unit tests for integrity. But a failed test always tells you something's wrong (either your code or the test). For my vision, all static analysis would also effectively be part of the test gateway - i.e. parse errors, pluggable and optional types, lint-like tools, et cetera.

For reactive programming, it also makes sense to have a lot of live tests - e.g. that can sound alarms if invariants are ever violated.

We are talking about

We are talking about different topics in parallel, let's focus.

I talk to people who are working in domains where perf is very important, they measure run-time as a matter of days rather than seconds or minutes. They are already using lots of incremental/iterative techniques, like hill climbing, and adding program changes on top of that would probably ruin their results. In that case, they develop using small data sets that can be run very quickly. We could forget about parallelizing compilation here, they are generally running with C++ (usually very fast) and, increasingly, NVIDIA's CUDA compiler (which is quite slow for some reason), also they want as little abstraction as possible because they really need to tune, tune, tune...they laugh at all the PL people who think abstraction is the solution and not the problem.

Now, going back to being incremental and parallelizing at the same time. To me, being incremental for live programming, you do that because you are interacting with your program in real user time, and then there better be lots of UI components to help you read your program's behavior. There better not be many of the throughput computations that typically benefit from parallelization, because then the user will have to wait no matter what tricks you pull. I would only make an exception for using the GPU for visualization and rendering tasks, since it provides an order of magnitude improvement when used correctly, rather than the linear N that you can get from using multiple cores/machines.

'Linear N' will in a few

'Linear N' will in a few years be more than an order of magnitude on even the cheaper desktop machines. Granted, GPUs will also up another order of magnitude.

I deal with live sensor inputs and sensor fusion - many megabytes per second from many sources, often with some sort of framing. Processing cannot be allowed to fall behind more than a soft bounded latency - we're forwarding a visual to a user after processing. But the processing is also easily parallelized. It would seem a waste to not take advantage. Extra number crunching power means more sensor data can be processed in real-time, or that more information can be extracted. The ideal infrastructure would allow us to run every CPU at near 100% and still have responsiveness. Even better if we can leverage cloud resources for extra processing.

If the reactive infrastructure is already designed for that, why not parallelize much of compilation? In some cases you won't be helped, but in most cases you won't be hurt by it either. If you thought of huge compilations as a possibility, you might even remember to add a progress bar.

That cores will scale is a

That cores will scale is a prediction that is by no means certain. We haven't seen CPU scaling yet beyond a certain point, and there is no assurance that are problem is something solvable.

The difference between interactive and non-interactive is measured in magnitudes. Speeding your computation up by 4 or maybe even 16 isn't going to cut it, you need to speed it up by 100. Going to the cloud is even worse, your round trip latency will kill the effectiveness of short real time computations, better to focus on throughput where multi-CPU parallelization is actually useful.

Your best bet is to be incremental. As you collect little bits of sensor data over time, you incorporate it into your model, that should be very quick. If processing small updates takes too long, the solution is probably better incremental rather than better parallel.

If you have a naturally distributed parallel problem, like dealing with a sensor network, then managing each sensor separately and aggregating the results in real time makes more sense then turning it into a parallelization problem.

The difference between

The difference between real-time and non-interactive isn't measured in 'magnitudes'. It's measured in 'milliseconds'. E.g. if it takes you 110 milliseconds to process frames of a 10Hz signal, that is not real-time: you'll need to basically run it offline. If it takes you 90 milliseconds to process a 10Hz signal, you can run that in real-time (supposing you can dedicate a core). If you can reduce it to 30 milliseconds, you could process a few such signals to improve visualization or correlation between them.

Parallelizing can make a significant difference for real-time, interactive programs. A factor of 4 or maybe even 16 can often more than 'cut' it.

What matters is rate-of-change relative to rate-of-processing. The cost of processing the change can be up or down several orders of magnitude, so long as rate of change is equivalently slower, and still provide a very interactive experience. Sure, it isn't going to happen for delays of days, but 'interactive' can easily happen over seconds.

I certainly agree one should favor being incremental where feasible, because that often allows cutting down costs by an order of magnitude (albeit, often at huge costs to complexity). But parallelization still has a very useful place.

If you have a naturally distributed parallel problem, like dealing with a sensor network, then managing each sensor separately and aggregating the results in real time makes more sense then turning it into a parallelization problem.

Providing a parallel solution to a parallel problem means it wasn't a parallel problem? Perhaps you need to clarify, because what you say here makes no logical sense to me.

Parallelizing can make a

Parallelizing can make a significant difference for real-time, interactive programs. A factor of 4 or maybe even 16 can often more than 'cut' it.

I think fixed parallelization overhead in this case would eat into any benefit you might obtain since the computations are short (in the 100 ms range). I would be happy, though, if you could prove me wrong.

Providing a parallel solution to a parallel problem means it wasn't a parallel problem? Perhaps you need to clarify, because what you say here makes no logical sense to me.

I'm assuming your sensors are already distributed, aren't you then aggregating results as they make their way to the user? Is it one of those problems where you have to wait for all the data to come in and then compute an answer all at once? In that case, I'm not even seeing how parallelization is possible.

I think fixed

I think fixed parallelization overhead in this case would eat into any benefit you might obtain since the computations are short (in the 100 ms range). I would be happy, though, if you could prove me wrong.

My work last summer was a multicore framework for tree computations where, as the main challenge benchmark, we've been speeding up each pass in a sequence of 1ms tree traversals. You're right that it's hard and not something that most programmers should be doing -- we're packaging it up as as an STL library (at the moment) and trying to figure out an even better interface for later (to automate packing, guarantee locality, etc.).

Going back to David's post about realtime and magnitudes, it's actually also an example of where we can help enable a computation become realtime. E.g., current realtime animations try to use packaged interpolations (e.g., tweens); we enable more complex layout solving that better approaches full-on constraints. CSS semantics currently has this weird separation of layout-time constraints and render-time constraints: our stuff, if it works, helps remove the need for it.

Parallelism works well

Parallelism works well enough at scale of a few milliseconds. Keep in mind that one millisecond is a few million cycles on a modern processors. Overhead for passing and receiving a message is only a few microseconds between busy threads. (It takes some careful architectural design, however, to only ask when data will be available and to get the scheduler to cooperate. I'm using using temporal semantics to help control parallelization in addition to concurrency... e.g. so I can spin off a few parallel computations and schedule their consumer.)

Sensor data gets run through several pipeline of processing, and each stage in that pipeline can be effectively parallelized. The time it takes for information to pass through all the pipes before visualization determines latency (how old is the data when it reaches the user) and overall memory costs. However, it is throughput that determines whether the processing can occur online in real-time.

It just so happens that a network of reactive bindings is a very pleasant way to express these pipelines, especially with regards to setup, tear-down, and data fusion (bringing data from multiple sources together, in real-time, to enrich a model). Indeed, I consider this a stronger use-case for reactive programming than is UI. It also makes a strong case for having a parallel reactive architecture.

Ideally, we could move a lot of code to the sensor or onto an intermediate network (overlay or cloud) so that aggregation and several of the pipeline stages are processed before they reach our local machine. That would give us a lot more processing power and throughput, and in general could save bandwidth. I'm nowhere near implementing this yet, but my RDP paradigm and language have been designed with code-distribution for both performance and disruption-tolerance as an end goal.

Haven't followed this thread

Haven't followed this thread in depth, but did notice two things worth mentioning:

Let's take large scale matrix computations, like building a model for speech recognition. That computation is going to take a few days to run, so it definitely has to be parallelized.

If manycore happens, one of the more significant wins will be taking currently offline algorithms and making them online. In our lab, two of our favorite examples are image processing and speech recognition. Most applications in these domains stitch together a whole host of algorithms, so we see a lot of diversity here. A particularly fun example I saw recently leads to the next point:

Interactive? It doesn't really make sense, any non-trivial change will probably require that we rebuild the model again from scratch.

I don't know much about speech, but I saw a mixed incremental + parallel image processing app about 3 weeks ago. If you think about something like video, it's fairly intuitive. How to express that is an interesting question (and I can send you some pointers offline).

I find this discussion of parallelizability confusing

From the literature, incrementalization is described as a two compartment model, with the properties being quiesence and independence.

We don't even think in terms of parallelizability until we understand the grammar we're manipulating. For example, you can take advantage of a priori knowledge of noncircularity in [edit: restricted*] attribute grammars, since that makes data parallel computation much easier. We could then apply the scan model to attribute grammars and reason about how much performance we could eak out.

* edited for clarification

Weak Memory Models

But I do so at a remarkably coarse granularity. Updates to remote vats accumulate in local queues, and at the end of each instant (after establishing consistency locally) there is one atomic operation per remote vat that delivers a whole queue of updates - potentially updating dozens of bindings.

This looks remarkably alike the description of a weak memory model induced by a per-(hardware)thread write buffer in multiprocessor hardware -- as described in these slides. It's no surprise, you're basically solving the same problem, but I'm wondering why those don't get mentioned more often here; there is a growing amount of research and formalization of those models.

Bottom Line: You're right

I am tired of vendors and bloggers and the rest of the Internet programmerazzi going on and on about E.C. and completely misunderstanding it.

As I've said before, most programmers don't understand accounting systems...

Just the other day I was told by one of our newer clients that they don't have a normal "Charge Master" (list of charges), and that the only way they know how much something costs is once somebody gets billed in a particular calendar year, and because of this they only adjust their rates once per year (!). Their notion of a "Charge Master" is a list of goods and services and then an associated set of pricing rules that can be arbitrarily complex and dependent on arbitrary factors. I'm not making this up; billion(s) dollar a year companies operate this way.

This is the sort of thinking that goes into bloggers who spend all day blogging about E.C. and how great it is; they have no idea how to actually design systems well so they latch on to buzzword techno-jargon and try to impress with that. They never actually check their work.

As for a system that can enforce these things at the linguistic level, well, you will not fix the root problem, which is lack of problem domain abstraction. I'm not sure I buy the idea that a language should favor a particular CAP Theorem trade-off, either. Why pre-determine your system by your technology you're choosing. Languages are very malleable things, they don't need to be like hardware.

Root Problem

enforce these things at the linguistic level, well, you will not fix the root problem, which is lack of problem domain abstraction

I disagree!

Problem domain abstractions are never developed in a vacuum. Problems usually cross multiple domains and integrate existing systems, and our choice of abstractions will be greatly influenced by this need for integration. Further, problems are rarely well specified in advance. Software requirements are discovered iteratively, and there is pressure to edit existing abstractions and code rather than scrap and restart for each newly discovered requirement. Very little is invariant: iterative changes are probably happening in whatever domain abstractions your problem will integrate, but under additional burden of backwards compatibility. Under these conditions, a path of lesser resistance (doing what is easy) will tend to dominate both your own abstractions and those of most services and libraries you use.

In order to understand the importance of language level properties, we must first understand that our problem domain abstractions aren't created from whole cloth. The quality of the problem domain abstractions we develop is, significantly, an emergent consequence of our language and communications substrate, limited foresight, and imperfect processes.

Assuming a good-faith effort by developers, systemic issues with problem domain abstractions or cross-domain composition should be understood as a symptom of the tooling (the languages and processes), not a root cause.

not sure I buy the idea that a language should favor a particular CAP Theorem trade-off

CAP is not "pick two". It's "pick at most two". A language that doesn't favor one or two properties is a barrier to achieving any of the properties.

That is, once you start integrating existing services and domain abstractions, you'll tend to achieve the least common denominator of systemic features. You can do better by developing a framework or architecture to constrain development. But that will require reworking and adapting all your existing domain abstractions to fit the framework, and the end result of that pursuit is an ad-hoc, informally-specified, bug-ridden, insecure, slow implementation of half of whatever language you should have designed in the first place.

Language designs should make a choice. That's among the most effective ways to raise the least common denominator.

Languages are very malleable things, they don't need to be like hardware.

Languages exist in a social context - e.g. one of sharing libraries, integrating services. Mutating the language is hardly conducive to a shared understanding in a social context. I consider it well worth sacrificing the more extreme forms of language malleability (such cross-cutting or aspect-oriented transforms, monkey-patching types, modifying the interpreter) in order to achieve nice properties across composition and domain integration.

I meant something else

I meant that a language that could model the physical trade-off directly in terms of a logical model, is better. People do this by hand today. Put it deeper into the logical modeling stack and you get tons of benefits for free. The problem today is that this means making average programmers compiler writers. That will change when big company's with big research departments realize that this is ultimately going to be the direction things will go in. In the past and even today, average programmers have failed to be compiler writers. And in the past and even today, the solution has been to create more compiler writing jobs for professional compiler writers (in other words, non-average programmers).

Give it 30 years.

By the way, this is clearly not just a social problem. I don't believe anybody is sold on such an idea today. There is the technical / engineering challenge of understanding how to actually carry this out.

a language that could model

a language that could model the physical trade-off directly in terms of a logical model, is better

That's what I thought you meant. Yet I do not agree that such a language is 'better'.

Composition and coordination across domains is more important than adaptability to specific problems. Consistency is more important than suitability. Instead of adapting the language to the domain, adapt the domain to the language (cf. relational modeling). A 'better' language is one that is effective for more domains or offers nicer compositional properties.

A meta-language that can be tweaked in subtly incompatible ways for each specific problem domain isn't just technically difficult. It's a bad idea.

Sorry, I don't understand ;-(

I don't have in mind any meta-language that can be tweaked in subtly incompatible ways... I don't even know how that follows from what I said earlier...

Responsibility of a Language Architect

You have indicated that a language shouldn't commit to (or favor) a particular CAP trade-off. You have indicated that, instead, languages should enable developers to commit themselves to whatever trade-offs they deem desirable.

A language that commits to little but encourages building a logical model that makes specific trade-offs is essentially a meta-language.

Developers and application architects, enabled with tools to make trade-offs specific to the application or domain, will quickly leverage those tools to locally optimize. These are smart and clever people. They'll do a lot of clever things with those trade-offs.

Unfortunately, the trade-offs made by application architects and domain developers are made at a local scope. They don't see, or consider, whole system issues. So instead of easy integration due to consistent trade-offs and enforced compositional properties, you'll get discontinuity spikes at the interfaces between libraries, services, and applications.

There are only a few people with a vision scoped wide enough to take into account compositional properties across apps, services, and libraries. You might call their product an OS, a common framework, a paradigm, a language or standard library. Those are, in essence, all the same thing. The people who build them are language architects.

As the only people with the vision and opportunity to make an informed decision, it is the responsibility of a language architect to commit to particular trade-offs in order to achieve nice systems-level properties. To do anything else is to shirk responsibility, to pass the buck to people who, in general, lack the power, vision, and deep knowledge of old mistakes in language design.

The idea that a language designer should make decisions is an unpopular one. Why? Because it suggests the language designer knows better, with implied arrogance. Because a decision might slightly hinder some obscure niche domain, and people from that domain will be quite shrill in protecting their interests. Because the language designer could make a bad decision, and nobody wants to be responsible for that. Because hesitation is always easy. But the only path to 'better' languages is to make these decisions, make the mistakes, learn the design patterns and frameworks developed to work around those mistakes, and correct for them in the future.

In the end, no matter what, a language architecture makes trade-offs. If you hesitate to favor a CAP trade-off based on some misguided ideal about leaving decisions to others, then the chosen trade-offs will be inconsistent and you will unwittingly sacrifice nice compositional properties between services and applications. If you make the trade-off between flexibility and composition knowingly, and can provide a good justification for it, that's a different matter - at least then you're taking responsibility for your decisions.

We're not on the same wavelength here, that's the problem

I am thinking about things more along the lines of the history of CSAIL at MIT, and especially work done by Carl Hewitt in theory of organizations. Also, Joseph Goguen's work on theory of organizations as well.

I think you can't have abstraction without precision, and tolerating ambiguity (or discontinuity or whatever we're calling it) is unacceptable. But if there is going to be major innovation in the next 50 years, it is going to be in shredding everything you are criticizing here.

if there is going to be major innovation

if there is going to be major innovation in the next 50 years, it is going to be in shredding everything you are criticizing here

I suspect that will be AI-complete (because, what is in that criticism to shred, other than human nature?). But you're free to try.

I believe significant innovation is possible due to acknowledging the limitations of our processes and structurally compensating for them, reducing probability for assembly-line application architects to make mistakes that will hinder scalability, system responsiveness, resilience, safety, performance, and so on. But, first, one must acknowledge criticism, not 'shred' it.

Hard, open problems are what computer scientists should pursue

And we should certainly be well-aware of existing theory that indicates limitations, but we also should try to better understand those limitations and know what exactly we've really proven; knowing what we've exactly proven is the evaluation process of "design" that is essential to our profession.

Do you mean Davids ( and Guy

Do you mean Davids ( and Guy Steele's ) commutativity requirements are too strong / special?

I think he means that

I think he means that David's (and Guy Steele's) commutativity requirements need to come with language, architecture, protocol, libraries, idioms, and an NPAPI plugin.

Gah. the accumulator abstraction

Noting the discussion above...

The "accumulator abstraction" Steele discusses is a notational crutch which obstructs various ways of understanding and manipulating a program. He is making a far simpler but still quite profound point than the responses above suggest. I shall attempt to clarify. All errors are mine, not his. Perhaps someone will argue that I, not the others, have badly misunderstood :-)

Consider that I have an indexed sequence, S. Indexes are integers and let us suppose that they are in the range 0 to N for some non-negative N. For simplicity of informal discussion, I'm ignoring the possibility of an empty sequence.

The same computation can be expressed in two ways we'll note. It can be expressed in other ways as well, but here are two ways of interest:

associative_reduction(+, S)

and

answer = 0;  for x in 0..N do answer += S[x];  return answer;

Ok, if you are concerned about the empty sequence case, I guess the first expression of this concept could be:

associative_reduction(+, S, 0)

What is really profound about Steele's observation is these three points:

  1. The notation associative_reduction more closely resembles how a human might normally describe the desired result if reasoning about it mathematically. We use capital sigma notation when proving theorems more often than we use loop notation, for example.
  2. The notation associative_reduction is of great help to a simple-minded parallelizing compiler.
  3. The for loop annotation is at least an annoyance and really, often, a serious problem for a parallelizing compiler. In effect, the compiler must be loaded with complexity in order to reducer the for loop notation to the associative_reduction code.

To this we add the observation that Steele is concerned with the domain of programming languages intended to be handled by parallelizing compilers.

We easily conclude that the for loop notation is inferior.

Nobody would reasonably disagree with all of that. What is new here is the question of whether we can generalize that observation. Steele suggests a novel "rule of thumb": watch out for accumulators in your programs and by extension your programming language design.

However it is that a parallelizing compiler might simplify the for loop to the more useful associative_reduction expression (or some equivalent), we know that it must in some sense eliminate the accumulation variable - the variable I called answer. Execution of the program, successfully parallelized, shall not have any state that is modeled in a simple way by the superficial semantics of that variable. That variable is a needless fiction, as far as the compiler is concerned.

And that variable is a needless fiction as far as a human is concerned. Its presence is obfuscatory to both man and machine. It is present, perhaps, only because we haven't tidied up the man machine interface making both sides more efficient.

Sheesh.

One interesting aspect of Steele's observation is that it harkens back to Iverson's Turing address from the humanistic side, and Backus' where the issue of implementation comes in.

Let me stop there and make one brief additional aside:

The continuity of tone, language, approach, etc. from Iverson through Backus and now Steele in this vaguely defined area is interesting and noteworthy, I think. I think it is significant that in not just this but in quite a few recent talks and writings Steele evinces a kind of love of all programming languages, each of which (my interpolation here) has something to tell us about computation, about human communication, and about how we relate those things in practical ways. I don't mean that Steele is some kind of programming language cultural relativist for whom no language is better or worse than any other -- far from it. Just that he walks a kind of high altitude ridge with a very broad perspective on practical tradeoffs between expressiveness that benefits human understanding and expressiveness that aids machine understanding of programs. As one example, if you have a language in which you are often tempted to write code using the accumulator pattern (initialize, modify step-wise, return) in order to express concepts that mathematicians would express more simply using concepts like associative reduction.... your language may be not well suited for an environment where parallelization is highly valued.

My view on this is that

My view on this is that loops are there for enumerating what your are accumulating, and accumulators are there to specify the operation of accumulating. It is also not difficult to detect when a loop is in fact a sigma, at least in Babel-17 it isn't: a loop is a sigma when the body of the loop does not assign to variables from an outer linear scope.

So basically, what is missing from most languages is explicit language support for accumulators. Once you have this, sigmas are just easy to detect special cases of loops.

Of course, this does not factor in that it might not be so easy to check if the accumulator is associative; actually, this should not be checked, but just specified by the programmer. The best solution would be to embed programming into an interactive theorem prover, and to prove things like associativity. I have done this, but I bet it will take at least another 10 years before these techniques hit the mainstream (but I think they eventually will), because currently they are just too time consuming.

recognizing loops as sigma

It is also not difficult to detect when a loop is in fact a sigma,

Certainly. On the other hand, it is hard to tell in general when a loop is an associative reduction.

In the talk, Steele mentions the notion of being able to declare a user defined operator as associative and commutative - even when the compiler can't verify this assertion.

Once you've gone that far, the programmer is being encouraged to think in terms of the projection of binary operators into parallelizable reductions. The compiler is charged with recognizing and handling those specially. For whose satisfaction, then, would our notation encourage the use of an accumulator when a more concise and familiar style of notation means less work for the compiler, and more clearly conveys the meaning and connotation to a human?

The best solution would be to embed programming into an interactive theorem prover, and to prove things like associativity.

I don't think it makes sense to prevent programmers from making false assertions about the associativity and commutativity of operators by insisting that such assertions pass an interactive theorem prover. The problem is that the properties you want to prove must be defined in terms of some equivalence relation -- any equivalence relation. The choice of equivalence relation is, in effect, part of the program's specification.

So what, you say? Well, this has at least two suggestive implications:

An operator may reasonably be both associative and commutative, and not associative and commutative, depending on the choice of equivalence relation.

It may be impossible to define the intended equivalence relation to your theorem prover -- no matter how good that prover -- without constructing painfully elaborate models of externalities not otherwise found or expressed in the program itself. That is, for a binary operator DOT and sequence S the programmer might be trying to say "reduce S using DOT; it doesn't matter what order you do things in or how the tree of paired partial results is shaped .... just reduce it any old way -- whatever answer you get is the one I want." The programmer can reasonably intend that even when, say for example, the numerical value of a result is different for each possible reduction tree. The programmer is essentially, in that case, just writing a non-deterministic expression -- asserting some condition of equivalence-by-intention among the possible results.

In that case, your theorem prover requirement has just kicked the can down the road. Instead of the programmer baldly asserting "assume DOT is associative and commutative" you are having the programmer define a "MYEQUIV" relation, presumably prove that it's an equivalence relationship, assert that that's the relationship in question, and then prove that DOT is associative and commutative under MYEQUIV. Don't we now have to apply another theorem prover to prove that MYEQUIV is the relation we really wanted?

A language that affords, not requires application of a theorem prover to verify some cases of associativity or commutativity declarations under well known equivalence relations is fine with me. Sounds swell. But it is not a complete solution and, indeed, if required rather than afforded -- cramps a language needlessly.

Incidentally, map-reduce results in a web search engine are a great real world example of a case where a programmer wants to assert that a particular reduction operator is associative and commutative under a non-standard equivalence relation of interest -- and would be greatly hindered by having to convincingly explain the equivalence relation to a theorem prover everytime the search algorithm is tweaked.

Minor disagreement

The programmer is essentially, in that case, just writing a non-deterministic expression [...]

And since accidental non-determinism is bad, a language shouldn't do what you're suggesting. Asserting to the compiler that a folded operator is associative (with undefined behavior if the assertion is false) should be distinct from requesting a non-deterministic fold. Though I agree with your greater point about not requiring a theorem prover.

disagreement poof

Asserting to the compiler that a folded operator is associative (with undefined behavior if the assertion is false) should be distinct from requesting a non-deterministic fold.

Aha! I think we might agree in a way.

I'm old school. I think the assertion of non-determinism relative to standard equivalence relations is immaterial to the compiler although, sure, it would be handy to a lint-style program. Checking is not compiling.

Language design is concerned with both compilation and linting, though. I therefore like the idea of a language design that allows one to note "i've asserted that operator X is associative and commutative although I avow it need not be under the standard equivalence relations".

What I don't much see is any reason to privilege that particular kind of annotation or any other annotation that really doesn't contribute to compilation per se. That is, I'd rather see a language that just allows somewhat arbitrary meta-data to hang off the phrase structures of a program than to see a language with a core concept of asserted non-determinism of folding with respect to standard equivalence operators. It's just practical. There is a long line of people with other proposals for interesting lint-helping annotations. Better to satisfy the whole lot at once than take them piecemeal.

Less disagreement, but still some difference

There are basically two approaches to get checked determinism and parallelism here: write the fold non-deterministically and prove it's deterministic (or claim it is, circumventing the checker), or write the fold deterministically and then have the compiler discover a parallel implementation (possibly with an unchecked hint). Since the unchecked hints are possible in both cases, the two are very similar.

Which is preferable would seem to depend on the intended semantics. If you're writing an arbitrary fold that will happen to be deterministic in certain cases, your approach seems preferable. If you're writing a specific fold that happens to be parallelizable in some cases, mine seems preferable. My guess is that this is the more common case, but I might be wrong.

Also, consider that the context might be parametric in the operation, and the parallel implementation might not always apply. How would you handle that case I wonder? Make the choice of determinism into a parameter, I guess.

"Best solution" was an

"Best solution" was an overstatement on my behalf, I guess. It depends on the scenario. In certain scenarios it is certainly "best" to be required to prove correctness (for example when building a library of mathematical algorithms where anyone may contribute), in most scenarios today such rigor is not required.

I agree that often you need an additional equivalence relation (besides the one induced by the language). Again, it depends on the situation if it is worth the effort even coming up with such a relation.

10 years might be optimistic, but I am working actively towards this goal. The 10 years are a very rough guess under the assumption that a group of about 3 or 4 people work non-stop towards solving this problem without having to account for the use of their time to anyone.

Loop notation is more readable than sigma notation in most non-trivial cases. Think nested loops, guards, etc.

"Loop notation is more readable..."

It is a matter of taste and practical experience up to a point. It's damn hard to just sweep under the rug all those extremely bright physics guys who love Fortran.

The contemporary parallelizing compiler writer definitely likes explicit reduction operators more than loops-for-reductions. Fair to say, though, that not all humans clearly share the same preference.

Also, nothing I've said here is meant to be in the slightest way critical of theorem proving for programs ... other than the narrow issue of automatic proofs being made an artificial constraint on compilation. I like lint except that the historic version is not nearly as sophisticated as I would like. So more power to you, from my perspective.

Mainstream?

You originally said, "10 years before these techniques hit the mainstream." I would say that "3 or 4 people work[ing] non-stop" on anything is extraordinarily far from anything I would call mainstream. I assumed you meant something more like "10 years before theorem proving is used where Java is used today." But now I assume you didn't mean that at all.

What I mean is: In

What I mean is: In principle, all the techniques needed are already here. Just fire up Isabelle (an interactive theorem prover: automated deduction is used locally, but YOU are in the end responsible for finding a proof), and you are good to go.

What is needed for the mainstream (here I include all those applied mathematicians that use fortran or matlab or whatever to program their numerical simulations; Java programmer not so much) to use interactive theorem proving is (in my opinion):

a) A stream-lined language interface, i.e. using (nearly) the same language for writing down programs, theorems, proofs. A simple logic, like set theory with some higher-order logic thrown in (in particular, no fancy, but only simple type theory).

b) A central site where you can submit your programs/theorems together with correctness proofs. This site would check the validity of your proofs automatically and in case of success include your stuff into the global knowledge base. Also a mechanism for searching and structuring (for example upvoting/downvoting) the knowledge base is necessary.

I think it would take a team like I described previously about 10 years to produce a) and b) in a quality that is acceptable for the "mainstream" and also takes care of such issues like parallelism (the site should for example have a mechanism for you to rent computing time to run your local programs which include libraries stored at the site). It won't happen by itself. But once it is done, I think the field will explode, and the mainstream will become more and more mainstreamy.

10 Years

I think ten years is an incredibly optimistic estimate.

10 years

Thinking more about what I wrote above, 10 years isn't an optimistic estimate because the problem is ill founded. No amount of years is sufficient to solve a problem that has no solution.

In real world situations (like generating search engine results) the equivalence relation used to assess associativity and commutativity need neither be known when the program is written, or constant during its lifetime of operation --- unless we stick to a completely trivial equivalence relation that amounts to an unchecked declaration that an operator is associative and commutative under whatever equivalence relation that declaration induces. The theorem checker for such a rule just always returns true.

Yes

I would tend to agree.

State of Parallel Programming

I have been trying to understand the state of parallel programming of stateful languages, and have some impressions and questions, and would appreciate feedback.

Multiprocessing adds cores to a CPU, and offers fast threading to get extreme performance. However, libraries, languages, optimizers and programmers must provide threads of code to run. Multiprocessing is fastest when threads do not share resources. The worst case occurs when one thread must wait for another thread to send a result. In general, threads will communicate; thus, processes must sometimes be suspended.

Whether or not a automated program proof can determine if a sequence of code can be processed in parallel is undecided. With help from an expert a prover works better, but may not be economically practical.

Exhaustive testing can demonstrate the validity of a subroutine, provided one can test all combinations of input values for all arguments, (assume all inputs are arguments). However, test time can increase rapidly as the number arguments and discrete argument values increases. For example a two argument subroutine with each argument having two values yields four results. A three argument subroutine with three distinct values for each of the three arguments yields nine results.

A single test that shows sequential evaluation of a subroutine produces a different result than parallel evaluation proves it cannot be evaluated in parallel. Unfortunately, it may take many attempts to find a single event that proves a subroutine cannot be evaluated in parallel, but with thousands or tens of thousands of cores, the wall clock time will be reduced.

A program manager may be tempted to run a production subroutine in parallel that passed a million sequential-parallel tests, provided a risk benefit analysis is favorable. Does such a test have any value as a precursor to program proof. In other words, would it be less expensive to run serial-parallel tests to find candidate routines for program proof techniques, or would it be less expensive just to run a proof for an entire application?

Guy Steele's suggestion for programmers to use SUM(X) instead of a loop has a variety of implications. Steven Obua's statement, "Loop notation is more readable than sigma notation in most non-trivial cases," captures the essence of my concerns. Not only should languages have SUM(X), but should also have set operations, matrix expressions, tuple formulas, and many other functions to replace loops. This sounds a lot like reusable objects, will this approach solve parallel programming issues?

The study of sigma led to integral calculus. Will it be necessary for parallel programmers to use higher math, including sets, matrices, tuples, etc. An advantage is that a programmer fluent in matrix algebra can quickly solve such problems. Unfortunately, it is also a disadvantage, because not all programmers are fluent in matrix algebra. Will the change to more parallel programming change the job market by making additional specialists, including a niche for theorem provers?

Will parallel programming be done in functional languages because they can be evaluated in parallel?

Codd wanted, in his abstract

Codd wanted, in his abstract formulations, a practical separation between a logical model of data and the physical realization of operations over this, plus transactional properties, plus a "relational" foundation that was so abstract in conception that SQL is but an example, not a final word.

Codd was the most precise language modeler we've seen to date. By the time of his death, he had a ridiculous amount of rules determining if a RDBMS was truly a RDBMS or not. You can read about them in his book.

The gist of Codd's point is that we can't begin to think about the physical layer until we understand the logical layer.

By the way, there was also a fascinating hardware model for this stuff, called the Gamma Database Machine Project.

Indeed, SQL is horribly dated for its failure to be cleanly integrated with a surrounding and more general programming language. The way it is used in various web frameworks is generally disappointing. Personally, I would guess it is that awkward integration that, until recently, fed a lot of no-sql hype.

SQL is horribly dated because it assumes that the application knows its target ahead of time. Ruby On Rails' ActiveRecord implementation shattered this a priori knowledge, and since then developers have grown really comfortable with various ways to late bind structures with functions over those structures. In reality, an application using SQL can never assume a static schema, since a schema needs to evolve as the problem domain changes. Rails' makes this clear with its Migrations subsystem. But the Migrations subsystem also tends to be marginally bad at handling large tables; I don't believe it can drop and create production tables using standard large table copy techniques for RDBMSes. For such scenarios, document databases are easier and most people suck at schema-design anyway, so avoiding RDBMSes frees them from the headache of having to deal with bad schema. There are also some mathematical issues at play here, such as the computational complexity of schema evolution.

P.S. If you want "awkward integration", please expose yourself to Peoplesoft or SAP systems where they have bizarre table names like these. Then do yourself a favor and go read William Kent's Data and Reality. Now you get a feel why consultants take 6 months just to determine how to do something, and collect a handsom sum at that.