How to handle errors

Having learned C++ I thought exceptions was a new, cool thing, simply the new Right way of doing it. I just never could get it right. I thought the problem was that I had been doing C for far too many years, until I stumbled across

Exceptions considered harmful

I then realized I've always done fatal error handling by killing the currently executing thread (or process when not using an RTOS), and normal handling by return value. For most part it worked well. There was one thing I had been missing: Automatic resource deallocation, where focus on automatic is more on "you can't forget" rather than "compiler will do it for you", though I wouldn't mind if the latter could be done safely.

I then stumbled on

Flow manifesto

Using Flow concepts as basis, would it be possible to get same help with generic resource deallocation as Flow offers for memory deallocations?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Not sure I understand you.

I didn't fully understand the question. Do you want something like unwind-protect in Lisp?

I haven't tried

I haven't tried unwind-protect myself, but as far as I could see when using Google it seems to work in a similar way as exception. What all pros and cons that means. That code isn't safe by default is a major concern for me regarding exceptions. I also have problems with how to make them scale when using distribution and parallellization.

No Silver Bullet

Flow's support for automatic memory management is just what C already does for stack-based variables, albeit slightly finer-grained - there's no magic, it just detects live ranges within a function. (It seems the Flow manifesto is ignoring or forbidding the existence of heap-based objects with complex lifetimes.)

It's not clear this would be a good model for generic application-wide resource clean-up on error.

No silver bullet I'm afraid

Yet, there are ways to make exception safe programming easier, see:
http://dlang.org/exception-safe.html

Also I'm not sure that it's a good idea to look at this "Flow" thingy, the author claimed many grandiose things but the source code is still not available.

I see a problem with states

I see a problem with states using exceptions. Imagine the following:

atomicUpdate(Object obj) {
Mutext takeMutex();
obj.a=func1();
obj.b=func2();
}

If exception occurs when calling func2, obj will have an illegal state. This is where I think exceptions seduces programmers to do unsafe code. Above problems could be fixed by having extra try block, but I think a good language should give you safe designs unless you really try to do weird stuff.

This is what "exceptions consudered harmful" tries to explain, and I don't see how RAII solves that. Have I missed the point here?

Another thing I'd like to throw into the mix is parallellization (does that word exist?). The way Flow seems to reason is that exception is merely another value of the calculation. Above example would then result in obj.b getting an exception value. They say something about that the return value must be checked. Whether assigning it to a variable would be enough is unknown to me.

The advantage of this would be that the function call could be implemented as with asynchronous messages if the functions takes a long time. func1 and func2 could then execute in parallell.

I'm sorry if I'm not clear about exactly what I want. I guess this is because I don't know...

But things I consider important are:
- Safety. States should not become inconsistent due to errors, unless you explicitly ask for it. I.e. as long as you keep it simple and do it the lazy way things should be safe.
- Distributable and parallellizable. Supporting both SMP and clustering of computers.
- The algorithm is clearly visible.
- The possible errors are clearly visible.

Currently the above list feels impossible to fulfill...

That's why transactions were invented

The above code needs transaction semantics, with either both func1() and func2() completing successfully (and their side effects occurring), or neither. (I'm assuming the Mutext object, in your example, frees the mutex unconditionally when the enclosing scope is left, regardless of how).

This is related to exceptions--an exception that propagates out of a transaction needs to terminate the transaction--but many imperative languages have lousy transaction support, if any at all.

But transaction-aware exceptions don't pose any particular problem.

Transactions are nice

... but they scale poorly for common effects patterns, such as: observer patterns, multi-agent blackboard metaphors, or long-running data streams.

(Transactions work best as a "smart" mutex.)

If you're looking for a decent semantics to make a few classes of errors considerably more graceful, consider lightweight time warp, which provides a limited time-frame in which developers can abandon commitment to an erroneous activity.

Probabilistic semantics would also work, i.e. the erroneous branches could be eliminated. This costs CPU and memory, but it parallelizes and scales very well.

Any sort of shared mutable state scales poorly

transaction semantics or no.

I think you generalize

I think you generalize prematurely.

Please consider studying cellular automata, graph rewriting, lightweight time warp, tuple spaces, content-addressed networking, distributed hash tables, filesystems, databases.

Then consider physics, and the world around you. The biggest computation you'll ever experience is shared, mutable, and (at least from our perspective) very stateful.

Let me clarify a bit

Shared mutable state scales poorly when you have to maintain an arbitrary global invariant. Many of the things you mention don't have that property, or approximate it. I will read up on LTW--which is a topic I'm not familiar with.

Maintaining arbitrary global

Maintaining arbitrary global invariants isn't very scalable. Though, one can carefully choose global invariants that can be achieved without sacrificing scalability.

Locality of reference makes sharing relatively simple

When considering reality as a computation, the "shared" quality is limited to the rate at which information propagates across reality.

When we're talking about events a light-year distant, what we are saying is that those events did not affect our local "shared" state at all for a full year after they occurred.

So we have shared state that we can reference, but if it isn't local then it is certainly stale.

Shared systems in general are more scalable when there is a finite rate at which information propagates. The subregions must be locally consistent but need not satisfy any non-local constraints nor acquire exclusive access to any nonlocal resources.

Ray

Indeed. And those are

Indeed. And those are properties that can easily be modeled in computation systems.

Cellular automata, and the generalizations used for large-scale scientific computing, are certainly an example of information propagating at a finite rate. Reactive Demand Programming explicitly models latency for similar reasons.

...and this is, presumably,

...and this is, presumably, one reason Einstein didn't like quantum mechanics's "spooky action at a distance" (non-locality). Reality is at least as hard to reverse-engineer as any other sophisticated program.

might be hard to treat uniformly

Mats: Currently the above list feels impossible to fulfill

(Aside: is that a programming language question, or C++ usage question? I don't think a PL can do everything, and C++ induces knuckle-gnawing in my experience. If a PL tried to make everything visible, you'd get obscurity via quantity.)

Two-phase-commit in one form or other makes a good general recipe for casual transactions done in-the-small, below radar of other tools. It amounts to 1) prepare all new state needed to commit, then 2) atomically commit all changes together. This implies the first phase retains all old state while gathering new state, so a non-local exit just discards new state without changing old state.

Once a decision to commit is in force, you can't interrupt, or must retry until done. In the example given, you can call func1() and func2() before mutex, then assign a and b atomically, assuming assignment cannot throw an exception. That's assuming those two calls don't cause stateful side effects that need undoing.

Even when using immutable resources, something similar must happen when you can fail to get every resource needed at once. The last part of phase (1) must be "pin all these things in place" so they can't disappear during commit. The goal of the first phase is to ensure the second must succeed (eventually).

It would be hard to make a PL suitable to handle every variant efficiently, if you wanted to allow both optimistic and pessimistic schemes, along with every sort of conflict graph possible. I'm not sure how to make semantically complex things simple in a language, without shoe-horning into a smaller phase space, which might be too small for some problem statements. (I hope I'm addressing your question.)

My question stems from a

My question stems from a language I intend to design that is currently nicknamed Sybil (any associations with Faulty Tower comedy series intentional) which means SYstem BuIld Language. It comes from an observation that there seems to be no good next generation language that would have the potential to replace GNU make.

But that is a side-track though since I'm thinking more broadly here. I think that a language should prioritize safe execution, scalability and a kind of what-you-see-is-what-you-get (WYSIWYG). Performance comes after. It should be possible to optimize in a secure manner but at the cost of a slightly more obscure code, though. My experience tells me though that in most cases people will throw hardware at it instead to improve performance, hence the need fior scalability.

I'm thinking that maybe functions should have an implicit atomic property where the language promise that the function will either retry until successful, fail with no (undone) side effects or finish successfully. I'm not sure what the consequences would be howerver...

(OT: replacing gnu make)

one of the major problems with build systems, as i see them, is that there are more twisty effed-up projects with twisted effed-up build requirements than you can possibly imagine. :-(

assigning an errval to a variable...

My own preference in systems that generate errvals is that, unless an operation X is specifically enabled (and documented) to take errvals as inputs, any subexpression of X that returns an errval causes X to return the same errval before the operation is executed.

This is relevant because assignment is an operation. And it is not one that I'd prefer were enabled to handle errvals, because stored errvals (errvals that are in state rather than context) are tricky. I want it to be very clear whether the errval returned from a variable reference is stored there or whether the variable reference (say, to a non-existent variable) caused the errval to be generated. This is the same reason you never store 'nil' in something that can be referred to by any operation which may return 'nil' for a different reason.

Rather than obj.b getting an errval stored, you would have the expression 'obj.b = func2()' returning an errval before the assignment is executed. Similarly the call to 'atomicupdate' would return the errval, causing its *own* continuation to return the errval rather than do further processing, etc.

In principle, you don't generally want errvals in your state (variables); only in your context. Which makes them behave a lot like exceptions. An operation that stores errvals in state rather than returning them in context would be useful for error-handling and debugger building etc, but would be too dangerous to use for storing values generally.

Ray

RAII to the rescue

Regarding "this is what 'exceptions consudered harmful' tries to explain, and I don't see how RAII solves that", I have on occasion used RAII to solve exactly this problem, you just extend the idea of the "R" to cover other things like changes to a data structure:

Write a class where the constructor manipulates a data structure, and the destructor undoes the change unless a commit-method was called earlier...

Of course this doesn't cover all cases, and you need to sprinkle around some other ideas like the ones from "Exceptional C++", but once you get into the mind-set it works pretty well (or at least I haven't found (m)any cases that required much further thought).

(Hm, presumably it worked out well for me because in my job most operations with external side-effects that would make this approach difficult when dealing with multiple operations in a single transaction are high-latency operations that are handled asynchronously, which means that there is an explicit state-model at a higher level that deals with it. Still, this kind of generalised RAII does cover some more ground, and in practice writing C++ with heavy reliance on exceptions to handle error conditions works very well for me.)

This sounds like

This sounds like transactions to me, or have I missed something?

Neat way of implementing them in C++ though.

It is very much like

It is very much like transactions, and if you are lucky they scale/compose well, too (where "scale/compose" means that you can use several of them simultaneously to form a larger transaction and "being lucky" means that you don't need these transaction objects to know about each other, which relies on several things, for example no more than one of the commit-methods potentially throwing (if there's only one, you call this throwing commit first, then all the other nothrow ones)).

When I implemented a

When I implemented a crash-safe file system for an RTOS, we used transactions. We had a separate class keeping track of stuff that could cause a commit to fail, which usually was resources. The only cases left for a commit to fail was due to broken media, so for these cases we simply put the filesystem in read-only state and give up.

In some cases however it was possible to back-out a change. This could pretty messy pretty quick, since it meant you had to accept that things might not be the same when you backed out the change than it was before you started it. Even though this was a single process handling it all. So you had to consider what consequences this could have...

So for me, transactions are nice when they are short with no dependencies on other transactions. After that, things starts to get messy...

I also have doubts about how transactions works in a distributed environment. Can a transaction be distributed? If not, things will soon get very messy.

So while I haven't rules out transaction completely yet, I do have my hesitation to make them an integral part of the language error handling.

A hammer for a nail ... I

A hammer for a nail ... I find the concept of a transaction very useful, not always, but often enough; however I would probably not even try to use it in a distributed way, because of the "attack at midnight" problem; so in practice I try to use the strong notion of a transaction locally whenever it can be applied, but also understanding that a distributed system is too different [to pretend that it's the same and attempt to apply the same principles]. Of course this doesn't [want to] answer your question about whether to include them, but I really do appreciate having a large selection of tools available because it increases the chances of having the right abstraction/mechanism available...

I see a problem with states

I see a problem with states using exceptions. Imagine the following:

atomicUpdate(Object obj) {
   Mutext takeMutex();
   obj.a=func1();
   obj.b=func2();
}

If exception occurs when calling func2, obj will have an illegal state. This is where I think exceptions seduces programmers to do unsafe code.

I don't know for sure, but I believe in C++ the Mutex variable is not released immediately so you can do something about the object in the exception handler.

My solution in dodo is to make it easy to write "undo" code for the shared variable. Even better, use versionned variables which shared value is not updated until the transaction is successful. Only problem, like Flow I cannot claim to have an implementation...

Note there are other problems with errors/exceptions, like what you do with them when they are thrown in an asynchronous call.

Resources Management in RDP

The paradigm I'm developing, Reactive Demand Programming (RDP), makes resource management very easy.

Resources are influenced, at any given instant, by a set of command and query signals (collectively called 'demand' signals). Most resources are fully demand driven, such that the resource can return to a rest state (be powered down, unloaded, deallocated, whatever) when there is no demand for the resource to be active. Durations of input and output signals are tightly coupled, so that disruption upstream (possibly caused by a failure) will quickly propagate and release resources. (RDP also supports speculative evaluation and anticipation, so it is possible to load and prepare resources if it seems they'll be necessary in the near future.)

Avoiding explicit stateful management of resources ensures they're far more robust to disruption of the demand signals, or other partial failure. Though, stateful idioms aren't all bad: combine keepalive signals and a logical timeout idiom, and the result can be declarative and robust. It isn't state that's the problem. It's events, including imperative manipulation of state.

(Aside: It seems RDP addresses every goal in the Flow manifesto, though my current implementation doesn't attempt any implicit parallelization. The designs differ significantly.)

Flow, and all that

The Flow manifesto assumes that the goal of programs is computation. This is also an assumption of functional programming. But, in practice, for a large number of programs, the goal of the program is to perform some externally visible action. Computation is a means to that end. The Flow crowd is trying to figure out how to program supercomputers. Most programmers aren't doing that.

Many of the practical problems with resource allocation today reflect computations that are interrupted for some external reason - the user navigated to a new web page, the machine was placed in a standby state, a game player has left an area causing a NPC to become inactive, or activity needs to be paused because the user has something more important to do, like take a phone call. All of those situations require resource recovery. Solutions to all those problems tend to be ad-hoc. Some theory would be helpful.

The Flow manifesto does

The Flow manifesto does focus very heavily on scalability for performance reasons. My own motivations for scalability were to support rich, open composition of programs and services.

RDP very effectively addresses your concerns regarding effects and interruption.

Where can I find more

Where can I find more information about RDP? Google thinks this is Remote Desktop Protocol...

Reactive Demand Programming

Reactive Demand Programming is dmbarbour's pet project. He has a blog at http://awelonblue.wordpress.com .

Needs more examples

After reading the blog, I'm not seeing much in the way of examples or use cases. "I tested dynamic evaluation with a naively recursive implementation of Fibonacci function, and a simpler behavior that rotated through simple math functions in accordance with the clock." isn't enough. This idea needs to be demonstrated to be useful on a hard problem or two.

A good test case would be to smoothly manage scrolling through a large tiled image, one of the problems map programs must solve. Requests for tiles are made from a remote server, some in anticipation of need. Outstanding tile requests may be cancelled if the user scrolls in another direction before the tile is delivered.

Agreed. As noted in the same

Agreed. As noted in the same post, RDP is not "production ready" yet. Its design is complete, as are the behavior primitives, but I have yet to develop the set of resource adapters to make it useful. (Since I only work on this a few hours a week, it takes time.)

Your suggested test case is a decent one, but requires some setup. Like, you know, first having a mouse and a window, a way to display images, and access to HTTP resources. Lots of unshaven yaks are in my way. (Tile-based map servers are something I've done a few times, but usually with a team; it isn't something that much interests me.)

You could always Google for

You could always Google for the full name (Reactive Demand Programming). But the best place to read about RDP is documentation for the implementation. (A few sections are a mess at the moment. I've been editing heavily over the last month, to make the docs more concrete.)

I have read your blog and

I have read your blog and README-file, and still haven't figured out what RDP is. I'll do some more Googling and see what I find. But somehow it seems to be based on message passing (called signals).

Framing one's Understanding

RDP is not based on message passing.

If you're grasping for something familiar with which to frame your understanding of RDP, try spreadsheets, but with an extra rule that a computation can effectfully push data to special cells rather than just pull data. That effect results in a much more expressive, modular, and extensible system than a typical spreadsheet. [edit: just added a section near top of README covering spreadsheet analogy in some depth; let me know if it helps!]

(One could implement a spreadsheet using message passing. But that doesn't mean spreadsheets are based on a message passing abstraction.)

There are other models that are close to RDP. If the spreadsheet concept doesn't help, you could try some of the writings on synchronous reactive, synchrony hypothesis, temporal logic (Dedalus), temporal database (Datomic), functional reactive, etc.

I expect (hope!) RDP will be one of those things where experience is more effective than explanation. Because, apparently, I'm not so good at the explanations.

I think I understand the

I think I understand the spreadsheet comparison, but I haven't undrestood what problem you try to solve.

If I my suggest something I'd like to suggest that you start the explanation with a problem. Then show how this problem is typically solved and what pros and cons these solutions have. Then introduce RDP and explain how an RDP solution would look like and its pros and cons. I'd think this would make it a lot easier for me to understand RDP.

Solving Systemic Problems

RDP isn't really aimed at a particular "problem", but rather at effectively addressing a common confluence of cross-cutting concerns.

Most developers are mid or small picture only. Developers also favor a path of least resistance. This leads to code that isn't very reusable in many contexts outside the context in which it was developed.

RDP constrains expression to better support reasoning, reuse, and composition. It addresses concerns of concurrency, timing, consistency, persistence, resource management, partial failure, predictable performance, open extension, live programming, resilience, clean service failover, heterogeneous view transforms, ad-hoc workflow models, first-class frameworks and overlay networks, scalability, modularity, and pervasive security.

This feeds back to help the small picture developer. Less rework is needed to adapt old solutions to new contexts. Less effort is needed to reuse and reason about code and services from other developers.

I think examples would be excellent didactically, and I plan to develop many. But I'm not sure how to explain the actual problem I'm solving; it's certainly meta to any particular example (which is as it should be, for a paradigm).

I think I understand now why

I think I understand now why I've got so much trouble understanding RDP. You have seen many problems that you think share a common ground, thought out something that provides a good framework for solving this specific group of problems. Now you try to explain this generic framework and all its pros, without explaining the whole journey you made to come up with it,

What I'm trying to understand is this journey that ended with RDP. Since I'm a practical person this is important for me to better understand the concept.

The journey towards RDP

I got into language design after reading a few cyberpunk novels (Neal Stephenson, Tad Williams, Mark Fabi) and wondering how one might support the sort of programming culture envisioned in those novels.

I understood these to be describing very large scale, federated mashups of services. UI is a form of ubiquitous program manipulation and service integration, i.e. wiring services together. There are billions of potential programmers. Many are out for your throat or your wallet. There is no reset button. There is much hidden information.

A basic question I had was how a programmer could reason about such a system while retaining sanity. How do we reason about safety, security, consistency, concurrency, resilience, robustness, and resources? The system is too large, too complex, to keep in the head of even an experienced developer. What about people who aren't so skilled? How might a two-year-old learn and grow with such UI and obtain useful, valid intuitions that will still serve well in ten or twenty years?

I eventually reached a conclusion that composition is the most essential meta-property to help the large scale systems. A set of compositional properties P has the feature that there exists some function F for which: forall X,Y,*. P(X*Y)=F(P(X),'*',P(Y)). That is, developers can reason about properties of a composite with only shallow knowledge of the components. This becomes a valid basis for intuitions. If critical safety, security, consistency, concurrency, resilience, robustness, and resource management properties can be made compositional, then users are able to quickly reason about those properties and focus the bulk of their attentions instead on domain modeling and correctness concerns.

Equational reasoning is also useful - not for reasoning about programs... but for reasoning about CHANGES to programs: refactoring and abstraction (and optimization). The most important equational reasoning properties are identity, idempotence, commutativity, and associativity. These are properties of declarative expression. (Imperative expression only has associativity and identity.) Functional programmers often seem to operate under a prideful delusion that "pure" expression is necessary for rich equational reasoning. But effects, carefully chosen, can achieve idempotence and commutativity.

RDP is the result of squeezing rich equational reasoning and useful compositional properties into a simple, uniform, efficient model.

The reactive dataflow aspects support live programming, consistency, concurrency, and resilience. Behaviors support secure encapsulation and distribution of authority in open systems (via object capability model patterns). Type systems can provide additional safety. Effects in RDP are achieved based on sets of signals (called 'demands') influencing a resource, which ensures idempotence and commutativity, which supports refactoring and abstraction. Resources are demand-driven, so if the demands are disrupted the associated resources can be released or returned to a hibernation state.

I de-emphasize correctness. More precisely, I believe supporting exploratory programming and growth of programmers is more important than requiring programs be correct, and correctness doesn't seem to be compositional even if I wanted it to be. The security, consistency, and resource management features can easily limit damage from incorrect programs. But RDP is not incompatible with use of interactive proof systems, constraint systems, and the like. I think those will still be useful in-the-small, for developing individual services or applications or even art assets. Soft constraints and strategies can enable a computer to fill gaps in a user specification with something mostly adequate, enabling a developer to focus on incremental refinement of programs. I envision such systems in-the-small being an important part of the RDP user experience.

Anyhow, RDP is what came of asking: how might a paradigm scale to a billion programmers? The OP mentions the Flow Manifesto, which emphasizes scalability for entirely different reasons. I'm a believer in convergent design. The motivations ultimately are irrelevant: only the structure matters, and when developed at sufficient depth and breadth, any set of PL requirements will imply similar structure.

I wonder if this helped you at all.

It has not been my experience that sharing RDP's history or motivations helps anyone understand it. I expect that RDP will be easiest to understand by using it, and that all this history is at best interesting trivia and at worst offers the impression that I'm hand-waving at abstract pipe-dreams in the clouds (as opposed to developing a concrete implementation of RDP).

It has not been my

It has not been my experience that sharing RDP's history or motivations helps anyone understand it.

No, I find it very helpful.

Well spoken.

I wonder if this helped you at all.

It has not been my experience that sharing RDP's history or motivations helps anyone understand it.

Honestly, this helps a LOT. Understanding the motive and seeing how the means are related to it is absolutely vital for me.

The few paragraphs above, at least from "reasoning about such a large system" to "scale to a billion programmers", and especially the part about composition being the most important property to achieve it, should be included in any introduction to RDP.

Still, I have a question or three.

How do the intuitions acquired working with large RDP systems help a programmer who's at the periphery of such a system working with code that's implemented in some paradigm where RDP is completely irrelevant? Can he easily figure out what is going to happen when this code attempts to interact with the system? And will his RDP-formed intuitions lead him astray in reasoning about, say, relational databases, unification systems, data-directed code, OO code, ADT-based code, or simple imperative programs?

Ray

Bear

At the periphery, an RDP

At the periphery, an RDP system must provide 'adapter' code to various resources and foreign services. Adapter code represents access to a resource via an RDP behavior (or small record of behaviors). Those behaviors must protect RDP's compositional and equational reasoning properties. Developers within RDP will simply learn the API exposed by the resource adapter (unless they really hate it and have a better idea). Or in short: this is a problem answered by a layer of indirection.

In Sirea, protecting RDP properties at the adapters takes discipline. Developers of the adapters tend to use behavior-constructors with `unsafe` in their names (e.g. unsafeLinkB, unsafeOnUpdateB). They are safe with respect to Haskell's types, but are unsafe for RDP's properties. The adapter libraries should apply these constructors carefully, hide them behind a wrapper, and export safe RDP behaviors.

There is no one-size-fits-all for resource adapters. But use of intermediate state or a blackboard metaphor is appropriate for bridging many imperative models. Ad-hoc workflow can be modeled by waiting on shared state. Modeling a resource as exclusively controlled (e.g. by linear types or by a separate RDP app/agent) easily sidesteps the idempotence concern. Most sensors are a relatively easy fit for RDP. Actuators - such as rendering to screen - sometimes take a little extra effort (e.g. to model a window manager or similar in case of multiple independent render demands).

Relational databases could actually be implemented or fully adapted to the RDP model, especially if one uses a temporal database. The "transaction" model was developed to adapt relational to imperative systems. Transactions are a poor fit for RDP, but aren't essential to the relational model. I know that at least Oracle supports subscriptions to queries.

With regards to OO and ADTs: RDP can represent most useful OO patterns directly, and can propagate ADTs or (for open systems) first-class sealed values.

Anyhow, RDP + state is a complete programming model. While a periphery exists, and will exist for a very long time, it will become progressively less relevant to RDP developers. Most resources or services only need to be adapted once. Popular resources and services will be adapted early. Some may be reimplemented for RDP. Eventually there will be middle layers that model things the RDP way, often superior and more convenient than raw adapters. I already have ideas for accelerated graphics pipelines, web services, and GUIs. As RDP gains killer applications, adapters will begin to run in the opposite direction - i.e. from envious OO and FP developers. :p

Very interesting

Very interesting reading!
Your view did put some perspectives on my design work for my own language. But for the question whether it helped me, the answer is: Yes, in two ways:
The first part was a good enough abstraction rocket to send my mind into enough high altitude to get rid of those details that previously clouded my mind. I re-read your readme file and suddenly understood a lot more.
The second part was a good introduction to understand what the introduction in the readme file tried to say. Previously it was too fuzzy to grasp for me. Now I was myself in a fuzzy state so it felt natural...
However, there is still one thing that makes it hard to understand. The step from those fuzzy ideas to a concrete example is a bit steep. It is manageable, but it might explain why some people doesn't seem to be helped by your background story. I do think that if I put some more time into re-reading this readme file, I might understand most of it. I'm currently working on it...

There is one thing though: The readme file has an incorrect link to tangible values. It goes to naked object instead.

Thanks for the notice; I've

Thanks for the notice; I've fixed the link to tangible values locally and it will be pushed with the next major commit. I spent most of my RDP hours in September trying to improve that doc, but have yet to figure out how to fit in a background story. (Maybe it would be better to write it up in a blog article and link to it. For now, I'll just link here and to the reactive-demand google group.)

Resource Theory

I think the reason resource management is ad-hoc can be understood by comparison to a particular resource which is reasonably well understood: memory.

We manage memory with reachability checks and garbage collection. The reason this works is that the resource handle for memory is a pointer which is itself stored in memory, forming a network of memory blocks.

In other words, the relations are manifest. In addition, a logically side-effect free uncoupled function (free) can reclaim memory (the actual side-effect is unobservable to the program because we're freeing unreachable memory).

The key here is that the collector orders the memory to be freed so that the same block isn't freed twice.

So why can we not do this with other resources? File handles, locks, whatevers? Why can't we represent them in memory with handles and execute a destructor prior to freeing the memory?

The answer is we can, but it doesn't work properly unless strict conditions are met: the freeing of unreachable resources must be independent of each other (so ordering must not matter), and the freeing process must not introduce new resources, nor modify any reachable ones.

An example of a resource satisfying these rules is the temporary file. Obviously, you can just delete them when they're not reachable, the order doesn't matter, and the mutation to the file system is not observable (by definition of "temporary" no one is supposed to see the file other than the creator).

An example of a resource utterly failing is a GUI resource. Suppose so want to get rid of some sub-window. You cannot just delete it, and free up the associated OS resource, because that would make all the child windows unreachable, and they'd be deleted in an arbitrary order. Meanwhile .. the screen never got repainted!

Instead, you typically have to post a delete message, which is then sent to the window and its children in a well ordered fashion, and callbacks respond to this to perform appropriate take downs.

Whilst not claiming the GUI resource problem is well understood, it is clear that most GUI's do in fact possess a sophisticated resource management system -- and one which is not "general" or "abstract" in the sense it is amenable to some Grand Unified Theory of All Resources (GUTAR :)

Another use case: time as a resource, as managed by operating system schedulers. And another core resource: access rights to resources. The core concept of Windows NT is key management, whereas Unix systems use a very ad-hoc system of permissions (yet, Unix is probably more secure, despite a deliberate attempt to build a better security system).

I fear the only way to understand resource management better must start with a catalogue of use cases and details of how existing systems try to manage those particular resources. All in all we don't really know how to manage even particular resources properly, let alone have a general theory.

Linear logic has direct

Linear logic has direct applications to resource control and management. Among other things, it can strongly enforce life-cycles and protocols. LL does not need GC; the task of releasing a resource is enforced on the programmer.

My own approach, with RDP, couples life cycle with communication - signal activity has a start time and end time, and the duration between these times is preserved from input to output across every behavior processing the signal. By associating resource control with active signals, RDP does not need global GC; resources (even memory) can be released based only on local knowledge. (Unlike global GC, local GC scales very well and is easy to keep real-time.)

OOP and message-passing models are inherently quite poor for resource management. This is because there is no clear indication of "this is the last message". LL can help address this, but only if we abandon aliasing and enforce a strong ordering on messages. RDP takes an easier route of abandoning message passing.

Most people are unwilling to grasp just how much message passing and other event systems hurt them. I expect this is because message passing is the only thing they know. If you step outside your comfort zone, I think you'll obtain a much better understanding of resource control.

I've done a great deal of

I've done a great deal of message passing lately since the RTOS I work for the past 12 years is based on message passing as being _the_ synchronization primitive. The kernel developers hated it when they were forced to add semaphores, mutex and other stuff because customers were used to using them...

But I have never felt that releasing resources has been a problem with message passing. I have other issues with it though. But having a "last message" indicator in the last message is nothing new. But then, I'm not doing OOP. Just plain old C hacking...

As for RDP signals (in the RTOS I use signals = message, if you wonder about my confusion earlier), is this a state held by an RDP object that can be viewed from others?

If you invoke an RDP object remotely, how do you do so without message passing? Do you use remote procedure calls?

You can develop

You can develop application-specific 'last-message' indicators, of course. But doing so suffers from being application-specific: it is not compositional, it is not well supported across libraries; there is no automatic support in case of disruption or other forms of partial failure; it takes a lot of discipline to get it right each time.

In an imprecise and informal sense, one might understand a signal as a state maintained by an RDP object (behavior) that can be viewed from another. Signals in RDP are point-to-point. RDP behaviors are invoked by signals. The aforementioned 'state' is not actually stateful, cannot accumulate information from its own history; it is more a value that changes over time (could naively be modeled as T->Maybe a); RDP signals must be maintained over time because their complete future isn't statically known.

As necessary, communication of signals can be implemented above message-passing, or shared memory, or TCP, or whatever is available. Just like "message", "signal" is an abstract communication medium and the implementation can be hidden behind the abstraction. Even if implemented above message passing, signals offer very nice properties: signal updates are idempotent, commutative, eventually consistent, have well-defined disruption properties (and implicit heartbeat, if needed). Also, unlike message streams, multiple signals can be composed (via coupling on T) without introducing any semantically observable non-determinism.

Still a bit abstract for me.

Still a bit abstract for me. Could you take an example? Let's say I have Inotify as an event generator. I then have two destination for events: One that should be triggered when directories are altered and one when files are altered.

The one that triggers by directories will add a new notify on the created directory. The one for files will display this on screen and then go away.

When a directory is removed the watch for the directory is removed.

Something like that (does not need to be precise). How would this be modeled in RDP? How would events ripple through the system? What will drive things?

The use of `inotify` is to

The use of `inotify` is to support low-latency observation of filesystem state. For RDP, do not ask how inotify "events ripple through the system"; ask how representation of filesystem state propagates.

In RDP, developers can easily represent continuous observation of a file or directory. The result of such observation is a RESTful value: the contents of a file, the directory list. The observed value may change over time, e.g. due to saving a file from an external text editor. The new value will propagate through the RDP behavior. That propagation is formally modified with a signal, i.e. with the file contents changing at a specific point in time. The signal allows faithful, deterministic composition of file state with other time-varying values.

At least for now, RDP must provide adapters to the existing imperative services, including filesystems. Fortunately, many events near the outer edges of software are really just hacks to support RESTful or declarative communication despite an imperative programming system; for these, adapters are easy. An RDP filesystem API might easily leverage `inotify` or `FindFirstChangeNotification` or even a polling loop behind the scenes - based on what's available, efficient, and easy to use.

In the rare case it is necessary, there are ways to model events in RDP. In essence: you represent events by recording events into state, then you use RDP to observe that state and influence other state.

Some RDP pseudocode

The one that triggers by directories will add a new notify on the created directory. The one for files will display this on screen and then go away.
When a directory is removed the watch for the directory is removed.

I'm sure I'm stepping on David's toes here, but here's some RDP pseudocode which could implement this application, using Haskell-like syntax for familiarity's sake:

-- Whenever a file is modified, display its description on screen for
-- four seconds.
map WindowManager.showStringNotification (freshFiles "/" 4s)
where
  freshFiles dir age =
    -- Get all fresh files under a directory, recursively.
    let children = FileSystem.ls dir
        files = filter FileSystem.isFile children
        subdirs = filter FileSystem.isDir children
        isFresh file = Clock.now - FileSystem.lastModified file < age
    in
    filter isFresh files ++
      flatmap (\subdir -> freshFiles subdir age) subdirs

Variables like freshFiles, isFresh, FileSystem.isDir, and (++) stand for behaviors. They're impure abstractions, but their side effects operate continuously over time. There's no need to set up and tear down watchers; just call things like ls. And there's no need to hide a notification when you're done; instead, structure the code's branches (and employ state as necessary) so that you're only calling show when you mean it.

There are plenty of potential misunderstandings lurking in this example. Due to idempotence of RDP effects, WindowManager.showStringNotification will never display the same string multiple times at once, so it's probably an unrealistic design... although it's just fine for this example! And the FileSystem, WindowManager, and Clock tools would probably be available not as global modules, but as first-class capabilities.

Thanks for providing and

Thanks for providing and explaining the example. Sadly, I lack easy access to such embedded syntax in my Haskell implementation. I wonder how well such deep recursive structure would perform in Sirea.

Idempotent display elements work very well for HCI. It's always sad when the computer is making some distinction that is not obvious to the human user. Idempotence forces developers to explicitly provide distinguishing context (location, relationship, etc.) rather than introduce duplicates. The result is better for both the program and the human user.

Thanks for all explanations.

Thanks for all explanations. Things gets more and more clearer to me, though I feel a bit slow in grasping RDP...

RAII plus improved signatures

I would personally like to see a language that combined RAII/destructors like C++ (e.g. can close a file automatically in the class design instead of in each finally block) with exception signatures ala Java (which does force the programmer to think about where errors are handled, pace the parent article) but with a more convenient syntax for building up lists of signatures (the lack of this in Java perversely forces programmers to use generic exceptions for everything and not think too much about handling them or what their actual origin).

Exceptions Considered Harmful Considered Harmful

After reading the Exceptions Considered Harmful diatribe, I think it can be said with confidence that it's abject nonsense. The issues raises in the blog posting are valid, but they are issues that aren't limited to use of exceptions: all of them arise when error codes are used as well.

I agree with one conclusion: error handling (in whatever fork) works well only when (a) the error is handled locally, or (b) error handling occurs at outermost scope and is limited to "cleanup before exit". It is also possible to imagine libraries or subsystems that maintain a transactional discipline; in that case exception handling wrapping transaction success/failure would also be a cleanly manageable pattern. But all of these concerns apply equally to both exceptions and error codes. The reason that mid-distance recovery doesn't work well is because the code attempting to handle the error is too far away from the source of the error to know enough to do the right thing. That isn't a syntactic problem. It's a semantic problem.

Let's take the rest in turn:

Hidden Control Flow Yes, exceptions involve control flow that is not manifest in the source. That was the point. The "check the return code" pattern has been empirically determined to multiply the number of total source lines by a factor of three when followed religiously. The overwhelming majority of those lines exist only to pass the error along. This is true to such an extent that one commonly sees wrapper macros used to reduce the distraction. Bugs are linear in the number of lines, so more lines is a bad thing.

Corruption of State due to Hidden Control Flow Nonsense. First, it is perfectly possible to prevent hidden control flow by not calling procedures within the state-updating portion of the code. Second, the possibility of an error arising in the middle of a sequence of updates can arise with error codes as well. The argument that error handling must be done with care when state updates are involved is sound. There is a tacit implication here that transactions might provide some useful ideas for such situations. But these issues arise regardless of which error handling mechanism is used. There may be an argument that one system or the other carries less syntactic weight in this situation. That requires a quantitative analysis of code that I haven't seen done.

Osterman Larry Osterman's blog entry deals specifically with Microsoft's Structured Exception Handling, which he notes is worse than conventional exception handling. Patterson's quote is out of context; Osterman argues against SEH specifically, not against exceptions generally. SEH is problematic because it isn't threaded properly through the application callback system (upward throws can be lost by the OS in some circumstances) and also because control flow can proceed downwards from the point of violation when certain hardware events occur. Even within Microsoft, SEH is seen as having significant technical concerns.

Parallel Programming Both exceptions and error codes are a form of sequential control flow, and both involve challenges with parallel programming. On the other hand, it seems clear that one thread cannot successfully clean up another in either system, and that any sane form of parallel programming must pay careful attention at points where state is exchanged. The parallel programming case is an important generalization of the state update concern, but that is all it is. The problems here are not particular to exceptions. I do agree with the point that propagation of error results across worker thread termination is important and under-attended, but this isn't an exceptions issue particularly.

The truth becomes evident at the implementation level: the implementation of exceptions can and should be seen as an optimization on a special form of multiple value return. The optimization is motivated by both performance and compatibility, and by the observation that most error code checks consist of "return the error upwards if one is received here".

Error Handling Patterns

It seems in several cases you attempt to argue for exceptions by arguing against error codes. Are you assuming a dichotomy between them? (Or did the article assume it? I've not been motivated to read it.) There exists a wide variety of error handling patterns. And there are many more possibilities my list doesn't touch, such as more approaches to multi-return function calls or choice arrows.

The challenge with mid-distance recovery is significantly a consequence of many error handling patterns. You consider the cases of error codes and traditional exceptions, both of which are handled after "unwinding" - even if you return enough information to know how to recover, you can't effectively apply that decision without a lot of rework, so that information is effectively useless. Resumable exceptions and error-counseling patterns offer greater ability to make a decision, and thus greater ability to use information available at the error site.

Of course, there are also many effective mid-distance error handling patterns that focus on robust tolerance rather than recovery. For example, fallback behaviors can result in very robust systems, and are among the strengths of logic and constraint programming models.

Regarding parallel programming: Error codes are data, not a form of sequential control flow. A rigid discipline of checking an error code might involve control flow. But there are many things you can do with error codes other than check them. For example, you can store them in a list or set, or send them to a logger. This flexibility has valuable consequences for integration of parallel computations. Data-based error representations (even error codes) will often serve more effectively for parallelism - especially data-parallelism - than control-flow representations (such as exceptions).

Exceptions also have negative consequences for reasoning about security of code - i.e. capabilities are hidden, rather than manifest in the code. This was what really killed them for me. You say, "Yes, exceptions involve control flow that is not manifest in the source. That was the point." And, indeed, it was a point made with good intentions. That doesn't mean it was a good point.

The truth becomes evident at the implementation level

It'd be wiser, I believe, to avoid confusing the truth of the implementation with the truth of the abstraction.

Weak preference for exceptions

I was responding to the blog post, which compares exceptions to error codes in stateful languages and makes some blatantly silly, biased argument. My point was that, with one exception, every inconvenience attributed to use of exceptions is equally an inconvenience of error codes. The notable exception was the argument concerning "hidden control flow", which in my opinion is trumped by concerns of error (bug) rate.

I agree with the rest of what you seem to be saying. Yes, the fallback pattern is useful, though it can readily be implemented in either approach. In the mid-distance case this pattern relies on consistent implementation of state unwind, which is hard enough to do in stateful languages that you don't see it very often in those languages.

I mostly agree with you about parallel programming. In my mind, the parallelism complaint in the original note was focusing on parallel execution of [leaf] threads exhibiting sequential control flow. Within the sequential components, error codes and exceptions both provide a form of data-carrying control flow: the first explicit and the second mostly implicit. I do agree that when an error (in either form) propagates to the outermost scope of a sequential [leaf] thread, the information embodied in the error needs to be conveyed as data back into the gather/join mechanism, where it eventually emerges as a [possibly partial] result of the join. What happens when a join result has constituent error results depends on the subsequent actions of the program. In effect, the exception turns into a distinguished result value (that is: an error code) at the join boundary. The same issue and resolution exists in other forms of dataflow-style computation, where the structure of the computation may not be constrained by a lattice as is true in strict fork/join models. The E language provides one example. So yes, I agree that in the general case the exception style needs to be transformable into the error result style in parallel programs.

There is certainly a security challenge associated with the hidden nature of exceptional control flow. That challenge has to be weighed against the security challenge of tripling the number of lines of code. I suspect one can find individual codes for which either approach is clearly better from a security standpoint, but in my experience the exception-carrying style has been much easier to deal with in my own code. Perhaps ironically, my preference for exceptions is strongest in critical code. This is true because such code is best written using purpose-selected idioms appropriate to critical systems, notably including a transactional operational model, in which exceptions should only be permitted prior to commit. When there is nothing (or nearly nothing) to unwind, the security audit issues arising from exceptions are pretty well non-existent.

Your point about implementation vs. abstraction is generally sound. What I was trying to suggest is that the control flow behavior of the two idioms is actually identical, so arguments based on the structure of the control flow or its impact on state are unlikely to discriminate usefully between exceptions and error codes.

The tradeoff you propose

The tradeoff you propose between exceptional control flow and tripling the number of lines of code is from that assumed dichotomy between error codes and exceptions. If you were constrained to a choice between error codes and exceptions, then I suppose I would understand the preference for exceptions. But you're not. There is no dichotomy.

Even if you do use exceptions, many faults of exceptions could be addressed by simply requiring an explicit prefix to propagate exceptions. E.g. use the phrase `[throws] Expression` to indicate that this expression may throw an exception that is not handled locally. (This would be better represented at the expression level, rather than the procedure level like Java, to represent explicit passing of exceptional capabilities.)

I favor dataflow approaches (and being rid of control flow entirely), but I also favor designs that make it really cheap (syntactically) to punt error handling to a later step, preferably as if it were handled inline. Causal Commutative Arrows are excellent for this: prefix actions to address only success cases (e.g. with `left Action` or `first Action`), and handle the errors later. (Commutativity provides the "as if handled inline" feature.)

experimentally, I find...

I have "error objects" as a data type in s lispish language with functions and procedures that potentially return multiple values.

When actually writing code, I find that the "right thing to do" in most functions is fairly exception-like -- if an evaluation within a function returns an error object, or a function gets an error object as an argument, and the function doesn't test for and immediately handle it, then the function itself should abort and return the same error object.

But procedures (ie, calls that have side effects, or whose semantics might be different if their arguments are evaluated in different orders, or which may not evaluate all of their arguments, or whose semantics may vary according to anything not passed in as an argument such as external or internal state) have a notion of internal control flow, and that kind of immediate abort would often give procedures confusing or otherwise "wrong" semantics leaving, for example, some expressions needed for consistency purposes unevaluated. So I find that in procedures it's generally best to treat an error object not tested for and handled as "just another value" and finish evaluating the procedure call, propagating the error value according to contagion rules.

I don't know if there is any general support for this distinction, but that's what I observe makes my programming life easier when working with a language.

Ray

Just sticking to the discussion

The tradeoff you propose between exceptional control flow and tripling the number of lines of code is from that assumed dichotomy between error codes and exceptions.

I assume no such dichotomy. I respond to the content of the original blog entry, which considers only these two options. That said, the dichotomy does appear to exist in mainstream programming languages such as Java, C# and C++.

Explicit annotation of propagation has been tested in practice, and is now generally recognized as unusable. Adding this is entirely sufficient to preclude adoption of a language, and in the present of appropriate tooling it adds very little value.

I can imagine an IDE that

I can imagine an IDE that visually annotates expressions with some information about what they might throw. But I've not seen "appropriate tooling" in mainstream IDEs. :)

Providing error-processing capabilities explicitly is valuable for controlling authority. If people have trouble annotating exception-throwing code, I think that would count more as a mark against exceptions than a mark against explicit grant of authority. Either way, it's always worth revisiting solutions that people "generally recognize as unusable" because people often generalize prematurely or misattribute problems.

the dichotomy does appear to exist in mainstream programming languages such as Java, C# and C++

The path of least resistance in those languages does encourage the dichotomy, but doesn't enforce it. I've seen and used other error handling patterns in mainstream languages (error objects, deferred exception objects, error counseling, etc.). Language design is really all about controlling that path of least resistance, to make the designer's "right thing" obvious and easy.

It's not possible

In the presence of libraries, precisely enumerated annotation of exceptions isn't really possible. The issue hinges on whether the exceptions thrown are part of the specification contract of a library routine. The practical reality is that they can't be. There have been well-motivated examples where the set of exceptions (or error codes) that might be thrown (or returned) from a library routine has needed to evolve. If thrown exceptions must be enumerated, the practical consequence is that every detailed annotation in every [transitive] caller needs to be updated. Particularly in the presence of dynamic shared libraries, this doesn't work out well.

Often, it turns out that the new exception/error code relates to some new functionality that was previously not present and is likely not reachable by pre-existing applications. There is no benefit to making those applications fail to type check when DLL features that they do not use are introduced.

The next option is to say "well, simply annotate whether it does or does not throw". The safe practice is to assume that everything does throw. Given this assumption, the addition of an annotation to everything in sight adds pretty limited value. If any annotation is to be added, it should be added to the rarer case: functions that do not throw.

It is possible to enumerate

It is possible to enumerate every error a library might expose. With generic types and algebraic sums, we can propagate errors without enumerating them. In that case, a change in error types will only require touching the code that operates on those specific errors. Exceptions generally have a weakness of being non-algebraic (i.e. we lose information about who threw). But there are still ways to annotate and enumerate them.

You suggest we introduce features and error modes to a DLL without exposing the error-modes for backwards compatibility reasons. Yet, you're giving up the ability to reason easily about errors today in order to achieve a rare flexibility benefit tomorrow. That doesn't seem a wise tradeoff. Really, it doesn't even seem a necessary tradeoff. You could instead design your language to support multi-versioned libraries (e.g. libraries via objects) or extensible types (tackling the expression problem).

The safe practice is to assume that everything does throw.

That's an expensive assumption, and (for most expressions) an untrue assumption. It's much more convenient for programmers if they are able to isolate errors to certain borders where they're easier to handle. Often, this means gatekeeper patterns, code that validates input before processing it.

Use static analysis to type call sites.

Often, it turns out that the new exception/error code relates to some new functionality that was previously not present and is likely not reachable by pre-existing applications. There is no benefit to making those applications fail to type check when DLL features that they do not use are introduced.

I question the benefit of a model of effect types in which an exception which a given call cannot reach, causes an application to fail to type check. Remember that fundamentally, static typing is the process of statically deriving a conservative approximation of runtime typing.

In runtime typing, it is individual calls to a function, and not the function itself, that return values of particular type or invoke effects of a particular type. Most traditional systems typecheck against functions and variables as a shortcut to a conservative static approximation of runtime typing, but a finer-grained approximation can be made (still via static analysis and without actually running the code!) by considering individual call sites instead.

Individual call sites usually give us several constant arguments, and other arguments restricted to a small subset of possible types or values. A particular call site can usually be statically typed far more closely than the function itself.

If new functionality has been added to a library, then it either is, or is not, the case that existing calls to that library can invoke the new functionality. If it is not the case that existing calls can invoke the new functionality, and you can prove it, then you are looking at a universe of existing calls which can be statically proven not to invoke the new functionality and which therefore do not need to be typed in such a way that they have to handle an exception which can only be thrown by the new code.

If it is the case that existing calls to that library can invoke the new functionality, then the new exception might be thrown in response to existing calls. In that case, failing to use an appropriate effect type to force existing applications to handle this case would be a more grave error than not doing so.

In order to make much progress with static typing of effects in the presence of exceptions, I think you need to perform your static analysis at the level of call sites rather than functions.

Ray

I question the benefit of a

I question the benefit of a model of effect types in which an exception which a given call cannot reach, causes an application to fail to type check.

Why is this so bizarre? Pattern matches must be exhaustive even if your program never hits those cases. Are you skeptical of data flow tying for this reason too?

This is an objection, not skepticism.

I'm not "skeptical" of the approach. Obviously, it is sound and it works. I object to it because I don't believe that it provides as much value as a different sound method that also works.

The static type of a function is a is a superset of the static type of any particular call site. For simple (argument/result) typing, usually the static types of the call sites are identical (ie, subsets but not proper subsets) to the static type of a function.

But effect types are much more strongly dependent on the particular method used to reach a result (ie, code path through the function in the specific case) and with complex typing (arguments/environment/results/effects) the type of a call site is frequently a *proper* subset of the type of the function.

So, while there's nothing 'bizarre' about applying the (complex) type of a function using the usual (simple) methodology, and it will definitely work, I contend that with complex types, working at the function level rather than the call level amounts to ignoring useful information in type analysis.

In short, I object to a system that requires handlers for exceptions in cases where it's statically provable that those exceptions will not be thrown. In fact, I am very certain that most programmers will object to it, if made aware of a choice.

Ray

Simple Approximations

Rather than thinking of conservative approximations as a weakness, I think of them as a feature: if a machine has difficulty reasoning about the correctness of a program, you can bet a human will have difficulty correctly reasoning about the program. Thus, static types can force humans to either simplify their programs or be much more explicit when detailing the complicated parts.

Dependent types are the intersection of "static analysis to type call-sites" and modularity. I would like to see more dependent typing in languages. Dependent types generally require total languages, but are an easy fit for FRP, RDP, synchronous reactive, etc. where there is never any need for local expression of non-termination.

Anyhow, regarding Shapiro's comment, I was envisioning a broader possibility: the new functionality and exceptions may depend on state or an implicit configuration. A static configuration could be reified for easy reasoning. If functionality is enabled by state, then we have a much broader problem of reasoning about when or whether that state is ever set (calling for difficult Hoare logic or similar).

Symbiotic reasoning via machine and human

I think of them as a feature: if a machine has difficulty reasoning about the correctness of a program, you can bet a human will have difficulty correctly reasoning about the program. Thus, static types can force humans to either simplify their programs or be much more explicit when detailing the complicated parts.

Don't we need much more interactive IDEs? I forget her name, but she was a research scientist that used to work at Sun who said that we need much more "forgiving" programming environments. She basically postulated that we need a questioning environment to resolve issues.

Function typing subsumes call site typing

I disagree with your core argument that "call site typing" is fundamentally more powerful than "definition site typing". Functions are abstracted over their arguments, but in a powerful type system they're abstracted over some part of the context as well (such as the set of resources available, or the types at which will be specialized, etc.). What you consider "local to the call site" is just a specific form of contextual information you can abstract over and expose in the function type.

If you know that a given call site respects condition X, and an other respects condition Y, you can type the function saying that "if X holds then ..., orelse if Y holds then ...". This will be easy to extend later to a third call that also verifies condition X.

A type is really a static description of the behavior of a program. All you can deduce about the function should be encodable in some type system, to capture the different behaviors corresponding to the different call sites.

Now I don't think this design would be very good anyway. If you cannot share behavior and specification between call sites at all, this is probably a sign that you should split your function in two different functions with each a more precise, and easier to manipulate, behavior. Of course I suppose your example comes up in situations where the behavior from the two call sites are partially shared so merging specifications is valuable (but maybe a factorization through two functions calling a common helper function would be enough). I still think you'll gain by changing your code to have the simplest specifications possible, and this mean that the precision phenomenon you describe should be very rare -- and possibly a sign of unsatisfying design.

That's a good argument for

That's a good argument for why one of "this code may throw some unknown type of exception" or "this code may throw some standard catch all class of exception" should be the default case.

In the presence of libraries, precisely enumerated annotation of exceptions isn't really possible. The issue hinges on whether the exceptions thrown are part of the specification contract of a library routine. The practical reality is that they can't be. There have been well-motivated examples where the set of exceptions (or error codes) that might be thrown (or returned) from a library routine has needed to evolve. If thrown exceptions must be enumerated, the practical consequence is that every detailed annotation in every [transitive] caller needs to be updated. Particularly in the presence of dynamic shared libraries, this doesn't work out well.

Often, it turns out that the new exception/error code relates to some new functionality that was previously not present and is likely not reachable by pre-existing applications. There is no benefit to making those applications fail to type check when DLL features that they do not use are introduced.

But here is an argument for why it is useful to have language support for exception signatures anyway. Consider the question of whether the programmer would ideally like to ask the question at a call site "What are all the exceptions that could be thrown if I call this function?" Clearly that would sometimes be nice to know. Let's say for the moment that this info could conceivably come from a combination of language and tool support. Would the ideal information be in the form of a listing of all sights that could throw an exception - e.g. possibly including every dynamic memory allocation. That would not be the most convenient form - it would probably be more useful to have the information broken out into categories/types. Should those categories types be determined only by the designer of the language and the designer+implementors of the standard libraries? I think it's easy to think of cases where that is not ideal either - e.g. if different types of I/O devices can throw different categories of exceptions based on semantics only known to the application programmer. Therefore it would potentially be useful to allow the library and application programmers to annotate in some way categories of exceptions, and in fact most languages with exceptions do allow this. However the place where existing languages fall down is in their ability to simultaneously a) make it easy to write non-brittle transitive code for the types of cases shap describes, b) not lose the original exception annotation that the programmer felt was relevant and important to begin with, and c) provide language support for breaking out the cases of interest at some chosen level of granularity and making sure that all cases in that breakout are actually handled. I don't see that as an impossible or pie-in-the-sky request. It just doesn't seem to be implemented in the languages I'm familiar with.

I can imagine an IDE that

I can imagine an IDE that visually annotates expressions with some information about what they might throw. But I've not seen "appropriate tooling" in mainstream IDEs. :)

It's probably not what you meant, but Eclipse does have some visualization support for exceptions. If you select an exception type in a throws clause or catch block, it will highlight in the method or try block all places where exceptions of that type are thrown. Similarly if you select a method's return type it will highlight all method exit points, which includes exceptions.

This doesn't work with runtime exceptions though.

What IS an exception?

In my opinion, the concept of an exception is too vague and poorly designed and that's the reason developers have difficulties with them. What's even worse, in its poorly designed form it penetrated in many mainstream languages and is considered 'standard' and mandatory.

There must be a better approach, and there is one. See, for example, how it's done in Aha!.

Exceptions

In languages with exceptions, the exception mechanism typically has a precise operating definition and operational semantics. I think being `vague` is not the problem. But the `poorly designed` is more promising.

Exception mechanisms in practice have several common characteristics:

  • exception mechanisms involve control-flow
  • an object contains developer-provided information about an error
  • handler is implicit and context sensitive
  • handler is selected based on the object carrying information

These characteristics may be broad and somewhat vague, but that doesn't mean we cannot trace problems to them. E.g. assumptions about control-flow interferes with abstraction of alternatives (concurrency, parallelism). And handler selection based on the exception object can make it difficult to compose errors or handle more than one at a time. Carrying developer-provided information on an implicit control-flow is twice a curse: exception mechanisms are both insecure and generally fail to carry information that is actually useful for recovery.

I believe that, should you develop an error handling pattern that avoids the problems of exceptions, it will also not be recognizable as an exception mechanism - not even broadly and vaguely, unless you stretch the definition of 'exception' itself to such an extent that it becomes unrecognizable.

The failure mode of Aha! is not an exception mechanism. Failure in Aha! is binary: it does not contain information, nor is there a handler specific to the information it contains. There is no control-flow involved. I consider Aha's failure model much closer to logic and constraint programming (with the `any` structure describing backtracking search).

Absolutely.

The failure mode of Aha! is not an exception mechanism.

That's what I'm saying. It's a (better) alternative to one, much better than using error codes.

Vague is the concept, not semantics

What I meant by 'vague' is that it's not entirely clear when exceptions should be used and what information they should carry. In my experience, I tend to avoid using them, except for assertions, because exceptions can easily make the program unstable.

In contrast, failure in Aha! is a simple, clear and universal concept: whenever some code can't achieve its goal, it fails, and any data it produces are unavailable - i.e. any code that depends on these data fails too. For example, if a loop doesn't find a value in a sequence that satisfies given criteria, it fails.

I'm a great believer in simplicity.