Monads vs. Actors

I just read Gilad Bracha's post "Why Monads Might not Matter." He argues that monads are probably too complicated for most programmers to understand, and that actors might work better as a primary abstraction to solve similar problems. Sharing a similar school of OO thought, this post really resonates with me, although I wonder if we are still misunderstanding monads. In particular, are monads meant to be...

  1. Primary programmer abstractions for interfacing with all things stateful with grand composability benefits from such uniformity.
  2. Abstractions for language/library designers to build programmer abstractions that hide their monadic heritage, perhaps to be revealed only to other language designers who are building other composable abstractions.
  3. Beautiful/elegant abstractions for describing how state is manipulated in a pure functional programming language, but not necessary or practical when an impure language is being used.

Glancing at the comments, I see many arguments. Monads have supposedly been successfully used in scalaz, but I wonder if this is because Scala has attracted more Haskell programmers than Java programmers as of late. Monads and more general arrows have been used to describe FRP, but I always thought that this was an implementation detail that distracted from elegance of the FRP abstractions themselves (and as a result I really didn't get FRP until making the connection with FrTime). Erik Meijer et al. have brought monad-based technology to .NET via yield statements, LINQ, and Rx, but again...when using these abstractions, I don't see or care about whatever monads are involved. Am I still missing the point?

Surely actors can be implemented with monads, but that isn't the point. What should programmers see?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Monads matter

As you say, actors can be implemented with monads, and I hate to be a contrarian, but that *is* the point.

Actors, as Gilad vaguely describes them in that post, import many particular monadic features (in essence, we're talking about a very complicated monad that's expressive enough that it can subsume many common use-cases for monads). Really, falling back on actors is no better than falling back on implicit state/IO that you get in C/C++/Java/etc. This is like the whole typed/untyped "debate" on a limited scale.

I think that your proposals 1 and 2 reasonably cover the sort of tool that monads are. In particular, you might find Phil Wadler's early papers on the subject very instructive (as in his modular interpreter, where monads are a particularly useful tool indeed).

Not even wrong

To partially answer your question about "what monads are supposed to be," I'll respond to your list. 1) No. 2) No. 3) No.

Here's a list in response.
1) Monads aren't about state and aren't about IO.
2) Something being a monad simply means that it supports some interface and is true whether you recognize/exploit this or not.
3) Monadic style can be useful and has been used in impure languages.

The main concrete benefit of recognizing that something forms a monad is that you can reuse all the operations defined for arbitrary monads. Failing to recognize that something is a monad doesn't keep you from doing anything; you'll just reinvent/rewrite existing functions.

It doesn't make sense to compare actors and monads. It doesn't really make sense to say something can be implemented "with monads." It doesn't make sense to say that monads are an "implementation detail" at least in the way I believe you intend it.*

Ignoring the fact that his premises are wrong and trying to fill in the details in a very forgiving way, actors would still fail to achieve what he thinks it can as far as handling impurity or even just IO in a pure language. Going further and actively distorting things to make Gilad be right somehow, he's suggesting (in this hypothetical world) a system similar to the original stream-based IO system of Haskell.

* Exploiting implementation details can lead to being able to prove a particular implementation is a monad, much like I can implement IEquatable and IComparable if I know that the hash function is injective.

Clarification

If I understand correctly, a monad is a form of abstraction around some implementation. When I say "implementing something with monads" I really mean "implementing something with something else that is abstracted as a monad." I'm sure Gilad's thinking was similar. Hopefully this clears up things a bit.

What does "abstraction

What does "abstraction around some implementation" mean? What does "abstracted as a monad" mean? Certainly it doesn't mean abstract data type, as, if all you knew about something was that it was a monad, you'd know nothing. And certainly it doesn't mean writing code that uses things that happen to be monads as I'm sure neither Gilad nor you want to claim that any program using lists is beyond the ken of mere mortals and would be better done with actors.

You (Sean) probably mean something like "an implementation written in monadic style," and again I have to say, something being a monad or an instance of monadic style doesn't go away if you're unaware of it. For something in monadic style, if you didn't know about monads, it would instead be in error propagation style, or state passing style, or continuation passing style, or backward state passing style, or environment passing style, or state/continuation/environment-passing error propagating listful logging style. Certainly, some of these "styles" aren't necessary in impure languages, but most languages don't have first-class continuations or Prolog-style non-determinism, for example, and thus would be required to adopt these styles as well. Recognizing that these are all instances of one simple, uniform style allows, at the very least, supporting syntax such as C#'s Linq or F#'s workflows. I can, if I want, implement (trampolined) continuation passing style in C# using Linq to support first-class continuations or lightweight threads.

The only sense in which saying "you can implement actors with monads" makes sense is in the same sense as "you can implement web servers/video games/databases with monads."

At any rate, Lennart's, Edward's, sigfpe's, and Tony's responses are more than adequate.

My point was more about

My point was more about wrapping monads up in another abstraction, and then composing them through these other abstractions behind the programmer's back. These abstractions could merely be syntax, or they could be something else. If the programmer is programming in monads but doesn't know that, are monads still important to them? They are no longer programmer-level abstractions in that case, they are implementor abstractions. The fact that monads or continuations or imperative updates are being used is just a detail right? I'm sure most C# programmers have not even heard of monads even if they are using LINQ (which really is based on more specific list comprehensions, but I digress).

Now, if the programmer is exposed to monadic style, then they are important to them. Should a language support that? It appears to me that Gilad is arguing for actor style vs. monadic style as what programmers of Newspeak should see and use. Of course, one could always go back and do monads in Newspeak, but it wouldn't be able to compose with standard libraries easily (arguably a part of the language), while if you decided to base your code on actors you'd have a much easier time.

Style is an important decision to be made in a programming language that has far reaching implications. This is what ultimately bothered me about Scala: it supports a hybrid programming model, it has a very nice object system and great FP support. But...a decision had to be made in the standard library: FP or objects? Ultimately I think FP won out and I found myself increasingly alienated as an OO person who appreciates FP but doesn't live by it. But you can't please everyone, and a PL needs to focus on a style.

Properties and, interfaces

You seem to be talking about different notions of monads.

You are asking whether it's a good idea to define some interface in a programming language that is supposed to represent a thing being a monad, and for programmers to write code directly to that interface. If that's the question, then at least for IO and State I think something like a finer effect system (for IO) or linearity (for more flexible state threading) is probably better than using a plain Monad interface.

Derek is focusing quite precisely on the mathematical question of whether some collection of objects and operations forms a monad, from which point of view talking about them as abstractions is a bit hard to understand.

You might ask "If a programmer is using associative functions but doesn't know that, is associativity still important to them?". Note the ambiguity on "associativity". That might make it clearer that the phrasing can be read as two questions. Is the programmer doing anything that takes advantage of associativity? Probably yes - maybe casually switching to a parallel fold. Does the programmer need to know the term "associativity", and a formal definition of the term, and consciously realize the formal definition applies to what they are working with? Maybe not.

Under this interpretation, "wrapping monads up in another abstraction" can only be unpacked as implementing some interface which is weaker than a monad in terms of primitives which do have monadic structure. That suggests to me some kind of regulated reactive programming (which probably isn't a monad unless you can receive a stream processor along a stream, and then delegate to it for all further events), implemented with things like threads and variables.

actors would still fail to

actors would still fail to achieve what he thinks it can as far as handling impurity or even just IO in a pure language. Going further and actively distorting things to make Gilad be right somehow, he's suggesting (in this hypothetical world) a system similar to the original stream-based IO system of Haskell

Eh, you misread Bracha. He never said anything about a "pure language". He was discussing a disciplined use of "pure actors" (that is, stateless actors) that may happen to hold references to "impure actors" - e.g. via a powerbox or dependency injection.

Please tell me how I am supposed to read the following

The most important practical contribution of monads in programming is, I believe, the fact that they provide a mechanism to interface pure functional programming to the impure dysfunctional world.

The thing is, you don’t really need them for that. Just use actors. Purely functional actors can interact with the stateful world, and this has been known since before Haskell was even conceived.

Exactly as written.

Purely functional actors can interact with the stateful world

Purely functional actors are simply actors that are stateless. They can create more actors (similar to first-class functions), and may forward or transform messages, but cannot 'become' or change local state.

Actors operate by message-passing. Even a purely functional actors can send messages to external systems and services, or receive messages from them.

With a slightly broader reading, and based on his comments, I also understand that Bracha is justifying decisions in Newspeak, which is an object capability language with duck typing. Such a language provides a number of advantages similar to pure functional with regards to confining effects. Some careful discipline to use 'pure' objects could allow backtracking and other neat features. Why, he asks, would we need monads when we can simply use some pure objects and message passing?

Actors operate by

Actors operate by message-passing. Even a purely functional actors can send messages to external systems and services, or receive messages from them.

This simply a call-by-value language; I don't see how actors themselves solve anything, it's just the strictness that removes the problems with side-effects.

How would you handle call-by-need as would be required to move Haskell to actor I/O?

Wrong problem

I don't believe that laziness vs. strictness is the issue at all. Actors model certainly isn't known for 'strict' ordering of operations. If you want laziness roughly equivalent to Haskell IO, it is entirely feasible to implement actors with eager transport of lazily evaluated messages.

Rather, the problem being 'solved' is expressing interaction and reaction when integrating a pure subprogram with an outside world. How do we express side-effects? how do we control them? (Gilad didn't touch on the motivation for using pure subprograms.)

What does actors model offer that an arbitrary call-by-value language might not? The object capability properties come to mind; even in Hewitt's original definition, it was clear that actors only obtain references to other actors at birth, by message, or by parentage. Stateless actors would be further restricted in which references they may use. Ocap patterns allow expression and precise control over effects, albeit in a different manner than monad types (more modular, but less accessible to cross-cutting transforms).

What does actors model offer

What does actors model offer that an arbitrary call-by-value language might not? The object capability properties come to mind; even in Hewitt's original definition, it was clear that actors only obtain references to other actors at birth, by message, or by parentage.

No more than it is in the lambda calculus. The only capability violations in the lambda calculus are environment extensions, ie. ambient authority, and actors suffer the exact same deficiency. So I don't see what problem actors solve in this context, nor do I see how actors solve the I/O problem which started this thread. You need some well defined language for specifying sequential messages to reason about I/O, at which point you've reinvented monads.

Ocap patterns allow expression and precise control over effects, albeit in a different manner than monad types (more modular, but less accessible to cross-cutting transforms).

They don't in the presence of laziness, which is entirely the point. OCap languages only permit precise control over effects because they've all been strict languages. You need some abstraction which permits the specification of sequencing in the presence of laziness when needed, which a monad is. Which isn't to say you can't perhaps find something a little less "strict" (like algebraic effects).

Laziness is not the issue

Laziness is not the issue. Monads are also used in F#, for example, which is not a lazy-by-default language (nor pure, admittedly). The issue is interfacing a pure subprogram with an effectful world, not strict vs. lazy evaluation.

The only capability violations in the lambda calculus are environment extensions, ie. ambient authority, and actors suffer the exact same deficiency.

If you're talking about impure lambda calculus, then perhaps this is the case. But Bracha was, specifically, talking about pure subprograms.

The only way to use capabilities captured in a pure lambda-calculus is to hand them to the caller, who can possibly exercise them on your behalf (similar to how monads operate). That is, capabilities are always exposed to the caller. This fails to achieve the object capability patterns and reasoning and control over effects that actors model allows.

So, I don't see what problem actors solve in this context, nor do I see how actors solve the I/O problem which started this thread.

Your complaint is of the form: "XYZ would also solve this problem, therefore I don't see how actors solve the same problem." Bracha never claimed that actors are the only solution for interfacing pure subprogram with an effectful world, though they are apparently his favored solution.

You need some well defined language for specifying sequential messages to reason about I/O, at which point you've reinvented monads.

Not true. I'll grant that coordination and synchronization is a big part of reasoning about effects, and I acknowledge that sequencing messages is (in many domains) a useful pattern for coordinating their effects (open the door before walking through it). But there are many ways to coordinate and synchronize and sequence behavior that are not monadic.

Without violating the purity constraint, you could specify temporal properties on messages (temporal actors model). Or you could introduce Oz-like logic variables (or futures and linear-typed promise pipelining).

Sequencing a bunch of 'continuation' actors is one possibility, of course, and I agree would effectively be 'reinventing monads'.

Ocap patterns allow expression and precise control over effects

They don't in the presence of laziness [...] You need some abstraction which permits the specification of sequencing in the presence of laziness

If the exercising of capability is lazy, that could be a problem (depending on the nature of the effect). But you can certainly have the message contents be lazily evaluated.

You mentioned moving Haskell IO to actor IO. Lazy messages offer approximately the same level of laziness as Haskell IO. Haskell IO monad sequences logical effects, but the values and state are still quite lazy.

The issue is interfacing a

The issue is interfacing a pure subprogram with an effectful world, not strict vs. lazy evaluation.

Yes, and reasoning about side-effects is only really problematic in the presence of lazy evaluation. One can reason about effects just fine in call-by-value settings if you forbid certain compiler optimizations.

The only way to use capabilities captured in a pure lambda-calculus is to hand them to the caller, who can possibly exercise them on your behalf (similar to how monads operate). That is, capabilities are always exposed to the caller.

I don't understand what this means. Every bound name is a capability in a lambda calculus, pure or impure. The pattern you describe is sort of monadic, but it has nothing to do with capabilities per se. I also don't see what is exposed to the caller, and why any such exposure could not be abstracted away using standard techniques that anyone would use in such a context, ie. type classes, functors, etc.

I also don't see why actors would be immune to this. I expect any actor would have a standard translation to a non-actor equivalent, though apparently this is disputed by some, given the Hewitt actor/NDTM thread.

Your complaint is of the form: "XYZ would also solve this problem, therefore I don't see how actors solve the same problem."

No, my complaint is that I don't see how actors solve the problem period, and from my understanding, making them solve the problem involves reinventing a monads (or something like them to ensure ordering of effects). If they do somehow solve the problem, then perhaps I have a deficient model of actors in mind, but stateless actors seem to have a pretty straightforward interpretation in terms of objects/closures.

reasoning about side-effects

reasoning about side-effects is only really problematic in the presence of lazy evaluation

I don't find that assertion even remotely credible.

Concurrency, coordination, planning, resource management, deadlock, progress, security, robustness, partial failure modes and error propagation, disruption tolerance, modularity, resilience - ALL of these present 'real' problems for reasoning about side-effects.

Also, laziness does not necessarily hinder reasoning about side-effects. It really depends on the side-effects. I've developed a paradigm around the notion of lazy, demand-driven side-effects.

I don't see how actors solve the problem period

Best I can tell, you still haven't recognized which problem is being solved. Laziness certainly is not the issue. It wasn't even mentioned as an issue by Bracha. You seem to be under the impression that 'the problem' is ordering of effects.

But Bracha is quite clear about what problem is being solved: "provide a mechanism to interface pure functional programming to the impure dysfunctional world". He doesn't mention order-of-effects. A 'mechanism to interface' doesn't need to order effects by default.

So I will say this one last time: you are attacking the wrong problem. Coordinating effects is an important problem, sure, but it isn't 'the problem' Bracha specified.

Every bound name is a capability in a lambda calculus, pure or impure.

Names within pure lambda calculus aren't capabilities: they don't designate any authority, and they aren't first-class - you cannot communicate them as 'names'.

Assuming there were capabilities in lambda-calculus, they'd look like IORefs or YURLs. And a 'pure' lambda calculus certainly couldn't exercise the authority designated by such capabilities. The only communication out of a pure lambda expression is the return value. Therefore, the best it could ever do is return the capability and ask the caller to exercise it on its behalf.

I expect any actor would have a standard translation to a non-actor equivalent [...] stateless actors seem to have a pretty straightforward interpretation in terms of objects/closures.

That's an unusual expectation. Very few languages offer standard translations. Such translations that do exist very rarely are fully property and proof-preserving and homeomorphic - that is, they are not 'equivalent'.

But Bracha is quite clear

But Bracha is quite clear about what problem is being solved: "provide a mechanism to interface pure functional programming to the impure dysfunctional world". He doesn't mention order-of-effects. A 'mechanism to interface' doesn't need to order effects by default.

I'm sorry, but what could you possibly do with effects but specify their ordering? The meaning of most effects are opaque to the language. Effects happen "outside" the local scope, so at top-level, effects occur "outside" the language itself, so all you can do is specify when they should occur relative to other effects.

Laziness causes problems for ordering even in sequential code, which isn't solved by actors that I can see, unless you force actor invocation to be call-by-value, which is why it seems to me that the evaluation strategy solves the problem. But again, perhaps I have a deficient model of actors in such a scenario. I'm not even going to touch the other scenarios you list until I'm convinced the sequential case is handled (though many of the concerns you raised are themselves effects, so I don't get why you'd list them in this context).

Names within pure lambda calculus aren't capabilities: they don't designate any authority, and you cannot communicate them as 'names'.

Yes they absolutely are, though I don't know what you mean by communicating them as 'names'. Bound names are absolutely and unequivocably object capabilities; capabilities are just references in memory-safe languages. Check Mark Miller's thesis if you require an authoritative reference: he explicitly states that existing object capability languages are equivalent to the lambda calculus with mutable state (the reliance on mutable state is not fundamental to this correspondence, it's just pointing out that existing cap languages are impure).

The only communication out of a pure lambda expression is the return value. Therefore, the best it could ever do is return the capability and ask the caller to exercise it on its behalf.

This is where the interface under discussion comes in, except I'm not convinced this interface can be provided by actors without modelling some structure that will end up looking like monads or stream processors (or other historic Haskell means of I/O).

Instead of dancing around the subject, perhaps whomever believes actors are viable here can provide a trivial example of actor I/O that works regardless of evaluation strategy. That would suffice to convince me it's at least possible.

what could you possibly do

what could you possibly do with effects but specify their ordering?

For starters, you could specify that they are to 'happen'. To some extent, you can also specify where they are to happen. You can also work on the other half of the effects problem: receiving and processing input.

You can't reasonably order commands relative to one another except declaratively (e.g. put them on a timeline) or relative to observations (e.g. input events). To whatever extent you keep your effects logically monotonic, the relevance of ordering is diminished. Ordering of effects is important, but secondary to input and producing output at all.

Sequential code is ordered by observations - i.e. progressing to the next step when the last one is observably complete. We make more use of this pattern than is warranted in most domains - not because it's a good thing, but because it's the path of least resistance with most state-of-the-art languages.

meaning of most effects are opaque to the language

Not really. If you send a message and expect a response, the fact that you expect a response (and perhaps even the type of response) will not be opaque. In general, the conditions upon which ordering of events might hinge will always be observable within the language.

Laziness causes problems for ordering even in sequential code, which isn't solved by actors that I can see

Laziness doesn't cause any problems, even for sequenced effects, if the logical ordering of events is bound at the point of the call. I agree things could get funky if you bind ordering-of-effects at a point of 'force'.

I still don't believe 'laziness' is even on the table as an issue.

existing object capability languages are equivalent to the lambda calculus with mutable state (the reliance on mutable state is not fundamental to this correspondence, it's just pointing out that existing cap languages are impure)

Sure, mutable state within the program isn't critical. But effects are critical - there is no authority to protect or designate without effects. Pure lambda calculus is effect-free - doesn't exercise any authority (lest it be impure). To claim it obeys the object-capability discipline is at best a useless truth (arguing fine points of definitions). Further, it is quite misleading - like advertising your lambda-calculus program has "never lost" a game without also saying it never played. So stop.

Names in lambda-calculus, such as 'x' in 'λx→Body', are not capabilities. Capabilities can be sent in messages, but lambda names cannot be sent in messages - they won't have the same meaning to the recipient as to the sender. Now, you might reasonably argue that you meant what is bound by the name. For an impure lambda-calculus, a name might bind a specific 'Ref' type (a'la ML, or IORef from Haskell), which would be a capability. A pure lambda-calculus can transport such capabilities, but the 'names' in general bind values. An arbitrary value is certainly not a capability; in general, values are forgeable and don't designate authority. You suggest Mark Miller's thesis as an authoritative source. Page 65: "A capability is a reference to non-data" (with further definitions of 'reference' and 'data').

Another article by Mark discusses names in lambda calculus and more generally: "Lambda-based programs never communicate names to each other, even though they use only names to bring about their communications."

I'm not convinced this interface can be provided by actors without modelling some structure that will end up looking like monads or stream processors (or other historic Haskell means of I/O).

I agree that some actors continuation-passing code will look a lot like monads (depending on the domain). But if you're distinguishing two designs, you should identify the differences in addition to the similarities.

"A capability is a reference

"A capability is a reference to non-data" (with further definitions of 'reference' and 'data').

There is disagreement in the community as to whether data constitutes a capability. For instance, EROS and CapROS have 'number capabilities'. Furthermore, every value of a programming language occupies storage and so induces an allocation effect. The ability to read/write data are implicit capabilities to data pages, and the ability to execute instructions is an implicit capability to a CPU slice (EROS's schedule capability). There is no sensible interpretation of 'data' which does not carry an implicit effect, so the distinction between data and non-data is meaningless.

What Mark is trying to do by excluding data is to limit the scope of access control to exclude secrecy and other properties based on knowledge. If you acknowledge data as a capability, you are tempted to apply the confinement to manage secrets as well, and he is (rightly) very suspicious whether this is even intrinsically possible due to covert channels.

in general, values are forgeable and don't designate authority

Whether they designate authority is entirely dependent on the operations you can perform on the values. For instance, if your language supports arbitrary precision integers, and you now add an 'identical?' operation to your language which compares pointer values, two integers which are otherwise equivalent are not identical. You can no longer "forge" such an integer, and this data has become a capability, in your parlance.

Thus, this distinction is artificial. Every reference in a memory safe language is a capability, it's just some are so fully encapsulated as to be fully interchangeable with each other. This does not mean they are 'forgeable', and thus not capabilities, merely that they are widely distributed capabilities which carry little authority.

Arguing for definitions

Choosing all data to be a capability puts you at a low level of abstraction. I endorse David's definition of capability, but might use 'values' instead of 'data'. To me, capabilities are references to things that will be affected by monadic effects. In high level code, large values that do not fit into registers will be around that do not require monadic effects to access. At a low level, though, you might treat heap access as a capability. I understand that some languages (e.g. BitC) try to unify these levels, but that's not the only possibility.

Also, whether or not values are constructible (forgeable) is irrelevant as to whether I consider something a capability. Your large integer example constitutes a capability only if you access it as one, in which case it will be unforgeable by construction. I don't consider arbitrary ADTs to be "capabilities."

To me, capabilities are

To me, capabilities are references to things that will be affected by monadic effects.

Monads are rather pervasive, so this would constitute a much larger class of programs than David's definition allows. I suspect you will then want to pick and choose certain monads that have "true effects". I ask, to what benefit?

Consider mapM, which abstracts over the particular type of monad, or analogously, map in a type and effect system: is map a program that accesses capabilities or not? If instead of executing the map over the bytes in a file, I execute it over a list of "file commands" which another I/O program then interprets, I have transitive authority over that file. The whole point of running programs is to trigger side-effects, you cannot then say that any one piece of data does not trigger an effect, because it will be interpreted by another part of the program to do just that.

Furthermore, the view that we can't treat purely in-memory types as worthy of "access control" just because they have no intrinsic side-effects is unnecessarily limiting. Rather I take the view of Lightweight Static Capabilities, that access control is just another type of property on a type. Whether that type is "data" or has "true effects" is irrelevant.

In high level code, large values that do not fit into registers will be around that do not require monadic effects to access.

In a high level typed language, the register contents are typed, so a program accessing registers is in a monad, and thus involves capabilities in your view.

Also, whether or not values are constructible (forgeable) is irrelevant as to whether I consider something a capability.

Well, forgeability is in the very definition of capability, so I wouldn't go that far. I know the traditional view on forgeability is unnecessarily restrictive, I just haven't given much thought to precisely how it needs to be relaxed.

Monads are rather

Monads are rather pervasive, so this would constitute a much larger class of programs than David's definition allows. I suspect you will then want to pick and choose certain monads that have "true effects". I ask, to what benefit?

I used 'monadic' to quickly get an idea across, but I probably should have just called them 'effects'. Essentially all I'm saying is that with monadic style, effects are made explicit and seen at top level. Capabilities can be obtained as a stylized use of effects.

If instead of executing the map over the bytes in a file, I execute it over a list of "file commands" which another I/O program then interprets, I have transitive authority over that file.

I'm lost here (which I suspect is a result of my mentioning 'monadic' without qualification).

Furthermore, the view that we can't treat purely in-memory types as worthy of "access control" just because they have no intrinsic side-effects is unnecessarily limiting.

I'm not saying that you can't setup access controls on memory accesses, but I'm not in favor of calling every value access a "capability access". Just the ones that are explicitly capabilities.

I used 'monadic' to quickly

I used 'monadic' to quickly get an idea across, but I probably should have just called them 'effects'. Essentially all I'm saying is that with monadic style, effects are made explicit and seen at top level. Capabilities can be obtained as a stylized use of effects.

I know what you meant, but that still rules out all the examples from the Lightweight Static Capabilities paper, and these are widely accepted as capabilities, so there is clearly a contradiction here.

I'm lost here (which I suspect is a result of my mentioning 'monadic' without qualification).

I'm not sure where I lost you either. mapM is parameterized over the effect type, and I provided two examples, the first where I explicitly induce an effect within mapM, and a second where I do not and the effects are instead executed by another program after mapM completes. These two programs have exactly the same input/output behaviour, but in the first mapM is accessing capabilities (file I/O), and in the second it is not accessing any capabilities, according yours and David's definitions. Instead, I would say that it is merely operating on a different set of capabilities.

I'm not saying that you can't setup access controls on memory accesses, but I'm not in favor of calling every value access a "capability access". Just the ones that are explicitly capabilities.

And I am arguing that your definition is limiting at best, and contradictory at worst, and the only sensible definition is that every value is accessed by capability in a memory-safe language. The examples from lightweight static capabilities in no way induce effects, and they are not executed in an "effect monad", and yet they satisfy every other meaningful criterion of capabilities.

Capabilities are simply references which simultaneously designate and authorize access to an object. This definition applies to URLs and distributed object references, references in languages, handles in operating systems, and so on. I can provide a capability URL to an immutable integer on one of my servers right now, and this would not be a capability in your parlance because it does not induce an effect.

You might say that it induces an effect by requiring the server to load the object when requested and return a result, but by that same logic, accessing a string reference on your machine requires loading the memory location into the cache on your CPU.

Capabilities are simply

Capabilities are simply references which simultaneously designate and authorize access to an object.

This seems to be the major difference. You've adopted an extensional definition of 'capability' -- a capability is any unforgeable reference that designates and authorizes access. You then conclude that certain values satisfy this definition. I'm advocating an intensional definition -- there are certain things called capabilities which designate effects.

Another piece you're maybe missing is that these effects specified by capabilities can be implemented by the invoking context. So you can provide a handler for "access database" that merely looks up a value out of another table value with no "real world effect" but still uses the effect mechanism. I only skimmed the lightweight paper, which looks to be more about type idioms than capabilities, so let me know if this doesn't address your objection.

I still don't understand the point of your mapM example. Since my definition is that capabilities are certain effects, why is it surprising or wrong that I wouldn't consider mapM applied with pure string operations to be using capabilities?

You might say that it induces an effect by requiring the server to load the object when requested and return a result, but by that same logic, accessing a string reference on your machine requires loading the memory location into the cache on your CPU.

I'm using "effect" as a term of art. An effect is an explicitly specified out-of-band side-effect. Warming the machine and priming the cache are not effects.

You've adopted an

You've adopted an extensional definition of 'capability' -- a capability is any unforgeable reference that designates and authorizes access. You then conclude that certain values satisfy this definition. I'm advocating an intensional definition -- there are certain things called capabilities which designate effects.

I understand your definition. I believed it myself for awhile. Unfortunately, it rules out a large class of references which are widely considered capabilities.

I will set aside the question of mapM for now, and simply focus on one question: do you consider the examples of "Lightweight Static Capabilities" to be capabilities?

If you do, then your definition of capability is insufficient since it incorrectly classifies these examples. If you do not, then I claim you are at odds with the wider community on what constitutes a capability.

I'm using "effect" as a term of art. An effect is an explicitly specified out-of-band side-effect. Warming the machine and priming the cache are not effects.

You didn't address the original question which the analogy was meant to answer. I said B implies C by correspondence. You said here, "not C". I cannot then infer whether you believe B or "not B", which is the real question I want answered *.

If "B", then your definition of capabilities is insufficient because it incorrectly classifies capability URLs as not being capabilities.

If "not B", this is a contradiction and an abstraction violation, because you are implying that the mechanics of near/far references underlying the language matters in one case, but not the other.

* B = dereferencing is not a side-effect, ie. HTTP request/response is not a side-effect for a remote reference, nor is a local fetch on a memory address, at the level of abstraction of the high-level language which has only opaque references.

I understand your

I understand your definition. I believed it myself for awhile.

Well, I don't "believe" in either definition. I endorse a particular definition because I find it more useful. If every value in a language is a capability, why not use the term 'value' for that?

I will set aside the question of mapM for now, and simply focus on one question: do you consider the examples of "Lightweight Static Capabilities" to be capabilities?

Obviously I wouldn't use what I call capabilities to implement that pattern. Doesn't your definition prevent you from introducing any explicit 'capability' concept in a language that includes all exemplars?

If you do not, then I claim you are at odds with the wider community on what constitutes a capability.

Maybe so :)

Well, I don't "believe" in

Well, I don't "believe" in either definition. I endorse a particular definition because I find it more useful. If every value in a language is a capability, why not use the term 'value' for that?

Good question, which I answered in David's post, but will repeat here: capabilities require memory safety, but also have other requirements pertaining to isolation and integrity properties (ambient authority in particular). This is where I think you over-constrain by requiring effects, because I consider this an impoverished subset of all capabilities.

I don't claim to have an airtight, rigourous definition yet, but I have plenty of examples which do not fit yours and David's over-constrained definitions, and don't quite fit the definitions David cited from Mark Miller's thesis, "Robust Composition", if taken too literally. The most general definition used on capability lists is the one I listed above: a reference that combines designation with authorization.

I'm writing up a post to cap-talk now citing this thread, because I'm now curious what others think of these sorts of "capabilities". I will post it here after some discussion.

that still rules out all the

that still rules out all the examples from the Lightweight Static Capabilities paper, and these are widely accepted as capabilities, so there is clearly a contradiction here.

I do not believe they are widely accepted as capabilities. (Can you justify the assertion that they are?)

your definition is limiting at best, and contradictory at worst

The value of any definition is a function of how limiting it is. The way to de-value a word - make it utterly useless - is to claim "everything is a".

I can provide a capability URL to an immutable integer on one of my servers right now, and this would not be a capability in your parlance because it does not induce an effect.

It would induce effects: delay, disruption (failure modes), potential for non-existence at a future point in time. That is, this is a capability distinct from the value returned precisely because access to it is effectful; the value obtained via such a capability varies over time. (Since values do not vary over time, clearly this is not purely a capability to a value.)

[I'm assuming you want to ignore the effects on server, such as access logs.]

I do not believe they are

I do not believe they are widely accepted as capabilities.

Question: is a revocable wrapper to an integer a capability or not? This is definitely considered a capability everywhere. The nature of a revocable wrapper is not much different from the nature of capabilities in that paper, so claiming that the former is a capability, but the latter is not is mystifying.

The value of any definition is a function of how limiting it is. The way to de-value a word - make it utterly useless - is to claim "everything is a".

I disagree. There is some value gained by classifying something as "a capability" in my parlance (even degenerate forms like integers), since it carries with it all the implications of memory safety + a few more (ambient authority and such). But the majority of the value is in the type of capability it is, ie. its properties, not in it merely being a capability.

It would induce effects: delay, disruption (failure modes), potential for non-existence at a future point in time.

The exact same properties exist for local values, just in different probabilities. But let's say for the sake of argument, that the computer with this remote value is sitting right next to your own computer, and connected directly via a cross-over cable. Is it still a capability? The probabilities for delay and disruption are almost as low as in the case of local memory.

I would like a quantifiable justification on the probabilities of delay/failure/etc. that distinguish capabilities from non-capabilities.

Capabilities are semantic

is a revocable wrapper to an integer a capability or not?

Of course it is. [Assuming the other ocap properties hold.]

The nature of a revocable wrapper is not much different from the nature of capabilities in that paper.

I do not believe your claimed analogy holds.

For the act of 'revocation' to make sense, you need a model of time, and possibly of concurrency (unless you insist on linear references). What model are you assuming of the paper?

The exact same properties exist for local values, just in different probabilities.

Indeed. We use references as an implementation detail for value semantics because we can maintain the illusion that the reference is the value. The problem of 'disruption' for local values not especially critical because you don't face 'partial failure' semantics - almost anything that disrupts access to the local value will also take out the whole computation.

The references used under-the-hood could be called 'capabilities' from that perspective. But you are making a serious mistake: you are confusing the implementation with the semantics. Under-the-hood, there are plenty of things that are not very capability-like as well (such as garbage collection, ambient authority for resource allocation (memory, CPU), orthogonal persistence).

the computer with this remote value is sitting right next to your own computer, and connected directly via a cross-over cable. Is it still a capability? The probabilities for delay and disruption are almost as low as in the case of local memory.

The probability of partial failure is significantly greater. If partial failure semantics enter the program at the point of accessing the reference, it would clearly be distinct from a value - a 'capability' if it meets the unforgeability and four connectivity requirements (initial conditions, parenthood, endowment, introduction).

But, assuming you hid the 'reference' under-the-hood and provided an effective illusion (from the application developer's perspective) of value semantics, then you could call it an 'implementation' of the value semantics. Achieving this illusion would still be feasible for LAN access - e.g. one way to provide the illusion despite greater risk of partial-failure would be to promote a partial-failure into a whole-program failure.

I would like a quantifiable justification on the probabilities of delay/failure/etc. that distinguish capabilities from non-capabilities.

It isn't the probability that makes the difference. It's the semantics that matter. (I would agree with Matt's implicit argument that the abstraction you are working at counts for everything.) Probabilities just make a difference on whether the semantics are feasible - whether the illusion provided by the runtime will be robust and efficient.

The references used

The references used under-the-hood could be called 'capabilities' from that perspective. But you are making a serious mistake: you are confusing the implementation with the semantics.

I don't think so. My claim is that forbidding immutable data from capabilities leads to a contradiction at worst, and excludes a large class of programs with clear capability semantics, at best. I can implement immutable structures with arbitrarily complicated access control policies, yet unless operating on them induces effects, in your parlance these programs are not using capabilities.

For instance, I can wrap file I/O by using a global array of all open file handles, and returning the index into this table to a client. The client sees an opaque type. Every operation on this file handle returns a future, with no side-effects occurring until we execute a "run" command. I consider the handle returned to clients a capability, and you would not. According to your definition, a capability is used only when "run" is executed.

The exact same interface can be implemented by returning the actual file handle instead of an index, and inducing those effects in-place instead of deferring them to "run". Now all of a sudden that opaque handle is a capability according to your definition. Do you not consider this an abstraction violation?

It isn't the probability that makes the difference. It's the semantics that matter.

I agree. Let's consider two scenarios:

  1. suppose we wrap all references, both local and remote, in a monadic-like abstraction so that arbitrary delays and failures can be handled automatically. They share the same semantics now. Are both near and far references capabilities?
  2. suppose that we require every reference, whether local or remote, to handle failure explicitly (each deref returns Maybe 'a, for instance). They share the same semantics now. Are both near and far references capabilities?

Of course, the second is trivially transformed into the same as the first when executed in a Maybe monad, so unless you answer both yes or both no, you have a contradiction. I am merely disputing your assertion that remote refs to immutable values are capabilities while local ones are not. Either they both are, or they both aren't.

The point is that I consider

The point is that I consider a function using capabilities to be doing something external. Even if I'm just accessing an immutable structure, I expect that if I'm doing it with capabilities, someone somewhere could be monitoring how often I've accessed it. Thus using a capability 'getFive' that always returns 5 is not the same as the literal 5.

Your example of file I/O is stated slightly wrong. In my lingo: I can write both of your variants using capabilities. I can also implement the pure variant without them.

Regarding your "two scenarios": 1) Both are, 2) Both still are. The monadic-like abstraction IS the capability system.

The point is that I consider

The point is that I consider a function using capabilities to be doing something external. [...] Thus using a capability 'getFive' that always returns 5 is not the same as the literal 5.

I understand, though I disagree that this is the defining characteristic of a capability. This is a defining characteristic of a side-effect, but I consider the set of capabilities to be larger than the set of side-effects, while you think the opposite, ie. capabilities as a stylized use of side-effects.

Regarding your "two scenarios": 1) Both are, 2) Both still are. The monadic-like abstraction IS the capability system.

Good, at least that's consistent. I was simply objecting to David implying that a remote value is a capability while the local value is not, which seems inherently inconsistent. Note we are talking about an immutable value here in case you weren't following the thread, so this seems inconsistent with your prior statements re:side-effects.

So is the list in the list monad a capability?

I think you misunderstood my

I think you misunderstood (and possibly still misunderstand) my position on the relationship between capabilities and side-effects. Capabilities are a particular mechanism which in general may cause side effects. I'm not saying you should look for side-effects to decide if some arbitrary thing is a capability.

Arbitrary monads are not capabilities. The capability system I have in mind is a particular monad.

Re: Scenarios

Every operation on this file handle returns a future, with no side-effects occurring until we execute a "run" command. I consider the handle returned to clients a capability, and you would not. According to your definition, a capability is used only when "run" is executed.

I'd consider the handle to be a capability. Lambda-calculus can transport capabilities. You can, in a pure language, freely wrap capabilities inside descriptions of what to do with them at some future time. But pure lambda calculus cannot exercise authority - cannot actually 'use' the capability. As you say, by my definition the capability is not 'used' until "run" is executed.

Bringing this back inline with the OP, the difference in when and where authority is exercised is what distinguishes pure lambda-calculus from a stateless actors model. Actors can directly exercise capabilities.

Whether exercising an authority designated is 'deferred' does not, in my mind, distinguish whether an individual reference is or is not a capability. But we should distinguish between individual capabilities and the gestalt concept of a 'capability system'.

Pure lambda calculus can transport capabilities, can wrap them in behaviors for someone else to execute, but does not constitute a capability system. (Of course, given Turing power, it can implement one, e.g. via monadic representation of effects. What matters is the abstraction and expression developers are using.)

The exact same interface can be implemented by returning the actual file handle instead of an index, and inducing those effects in-place instead of deferring them to "run". Now all of a sudden that opaque handle is a capability according to your definition. Do you not consider this an abstraction violation?

Is this a quibble on the nature of 'reference'?

Under-the-hood of a typical ocap language implementation, pretty much all 'references' are simply 'indexes' into an address space (aka 'pointers'). What matters is that the capability semantics are maintained: Unforgeable - developers lack ambient authority to turn an index into a reference. Communicable - the reference can be introduced or granted and mean the same thing to the recipient as it did to the sender. (This means, for example, that an index into an array of file handles named in dynamic scope would not be a capability.)

  1. wrap all references in a monadic-like abstraction so that arbitrary delays and failures can be handled automatically
  2. we require every reference, whether local or remote, to handle failure explicitly (each deref returns Maybe 'a, for instance).

the second is trivially transformed into the same as the first when executed in a Maybe monad, so unless you answer both yes or both no, you have a contradiction

Not all programs that can be written in (2) have an equivalent program in (1). Any transform function from (2) to (1) is partial. Using a 'Maybe' monad to wrap case (2) covers only a tiny subset of all possible programs that can be written in (2). Developers in scenario (2) have much more freedom on how to handle a partial failure than do developers in scenario (1).

Further, your argument would not hold even if you could transform (2) to (1). In general, the ability to 'transform' A into B does not mean that B has the same properties as A. To argue two systems of expression have the same properties, you would need to prove the transform is proof preserving and (roughly) homeomorphic (i.e. that a small change in B has a corresponding small change in A, and vice versa). (cf. gimbal lock in mathematics) - I recall someone using a dedicated name for the concept, with regards to relationship between language and runtime, but I'm blanking on it.

I am merely disputing your assertion that remote refs to immutable values are capabilities while local ones are not.

I never made that assertion.

If your developers explicitly treat local references to values as 'references' - e.g. with dereference operations - and you could transparently implement the various capability patterns (such as the revocation pattern you mentioned earlier) then the references would indeed be capabilities. But pure lambda-calculus doesn't allow the latter half of that requirement. Values in pure lambda-calculus are certainly not references. Accessing a value in pure lambda calculus is not the same as a dereference operation. It may be that references are used under-the-hood, in the underlying lambda-calculus implementation, but that decision is irrelevant to the value semantics being used by developers. I have the impression that you are mentally substituting 'reference' wherever I say 'value', begging the question of whether they are distinct. (Certainly, values must be distinct from references, otherwise we would regress infinitely when trying to dereference.)

With regards to 'remote' references, I've not said those would be any more or less 'capabilities' than local ones. I do say that you will fail any attempt to provide value semantics using remote references. You will fail due to fallacies of distributed computing, partial failure and disruption, observing partial computations, performance, and heuristic (imperfect) GC issues.

Consider: in Haskell we can develop code of the form Maybe (IO a). To begin executing the embedded IO operation, we must gather and compute just enough of its value dependencies to distinguish it from Nothing.

If we use only local references to values, we only need to compute just enough of each value to prove it isn't Nothing, then we can begin executing the IO while computing the rest of each value lazily. Laziness allows memory efficiency (garbage-collecting old parts of the value) and reduces latency before we start executing. That is, the use of local references provides a robust, high-performance implementation of value semantics.

If, on the other hand, remote references are used for values, then a value might prove to be Nothing only much later due to disruption or partial-failure in the gathering operation. Without violating the Maybe abstraction, you cannot begin executing the IO operation until you finish gathering the values. (A similar story can be told if the Maybe is to catch type-safety errors, since you cannot safety-validate remote values.) Ultimately, maintaining the abstraction will be inefficient and high-latency.

In practice, that sort of inefficiency is unacceptable. People will instead take risk violating the abstraction; the implementation will be (de-facto) type unsafe. Any illusion that remote references to values are the same thing as values will, in practice, certainly be leaky and broken.

For similar reasons, explicit laziness semantics for value computations will also be broken once you go distributed. That doesn't mean you can't have lazy evaluation, only that it can't be part of the value semantics. (E.g. lazy total functional computations are compatible with distribution.) Once we go distributed, it is critical that we distinguish between a capability and a function value (or, equivalently, any coinductive type).

Pure lambda calculus can

Pure lambda calculus can transport capabilities, can wrap them in behaviors for someone else to execute, but does not constitute a capability system.

The lambda calculus does not support operations on integers either, but you do not consider them capabilities. You have said I am making abstraction violations by mixing implementation issues with semantics, but this seems to be the same sort of mistake. We can just as easily model an entire file system, operating system, and remote hosts as we can the integers in the LC, so I don't see why the latter are not referenced by capabilities but all the former are.

I think we fundamentally agree that capabilities are references that both designate an object and carry authority controlling access to said object, but we disagree on what constitutes "authority". You and Matt have a very strict view which I don't share.

Let me pose a hypothetical which I posted to cap-talk: your machine raises an interrupt on divide-by-zero. As a result, most safe languages perform a zero check before division (or they catch the trap), and raise an error on divide-by-zero. Suppose now that there was a means to create an integer that was guaranteed not to be zero. Would this not constitute an authority to bypass the divide-by-zero check? Would this "non-zero integer" then be a capability? Why is "authority" limited to side-effects?

Finally, you have commented before that access to reflection should be controlled via capabilities, and yet reflection info is immutable, and does not involve any side-effects. How do you reconcile this?

Is this a quibble on the nature of 'reference'?

No, it's merely an observation that capabilities are no longer a reliable reasoning tool. The developer cannot view any function signature and verify, "this function uses capabilities, and this other function does not". In my view, capabilities are everywhere, and the developer can say, "this function uses these *types* of capabilities, while this other function does not". In its most basic form, capability security is purely a restriction on the initial conditions of a pure LC program, and the available operations are the "authorities".

Any transform function from (2) to (1) is partial.

I disagree, just sit in a wait loop until the remote ref resolves. I didn't mean the Maybe monad literally, just something like it. The only difference are the probabilities of failure here, and thus the rate of progress.

Such a translation may have different temporal or spatial semantics, but these semantics do not by themselves classify when something is a capability, so I think they're immaterial to the point. I agree they're important in a larger sense though.

If your developers explicitly treat local references to values as 'references' - e.g. with dereference operations - they would still be capabilities. But pure lambda-calculus doesn't involve doing so.

So, adding one level of indirection makes anything a capability? That sounds bizarre to me. As for whether you said it, I believe it was this post, where I said, "I can provide a capability URL to an immutable integer on one of my servers right now, and this would not be a capability in your parlance because it does not induce an effect", and you replied, "It would induce effects: delay, disruption (failure modes), potential for non-existence at a future point in time". I inferred from this that you consider a remote ref to an immutable value to be a capability because you consider it to induce effects, while you consider the local ref to not be a capability (from prior statements). Where did I misstep?

I will not comment on the actor model since so many comments are at odd with my mental model that I will have to review it first, particularly regarding integration with the LC.

No such thing as "one" level of indirection

adding one level of indirection makes anything a capability?

There is no such thing as "one" level of indirection in an object capability system. That is, you either have zero levels of indirection OR transparently an arbitrary number. There is no middle ground.

lambda calculus does not support operations on integers

Number values are common in most lambda-calculus languages, and certainly in functional programming in general. But, even without that, we can model Church number values in lambda calculus as simple terms.

We can just as easily model an entire file system, operating system, and remote hosts as we can the integers in the LC

No, we certainly cannot. We can compose operations on numbers without any need for linearization. Issues of indeterminism and concurrency heavily interfere with modeling file systems and operating systems as easily as numbers.

your machine raises an interrupt on divide-by-zero. As a result, most safe languages perform a zero check before division (or they catch the trap), and raise an error on divide-by-zero. Suppose now that there was a means to create an integer that was guaranteed not to be zero. Would this not constitute an authority to bypass the divide-by-zero check?

Authority? Perhaps. Capability? Probably not. For clarity: I am using 'capability' as synonymous with 'object capability', not any of the multifarious other things historically called 'capabilities'.

A proof of a non-zero denominator passed to an 'ambient' division operation would much more closely match the model of certificate authority (cf. SPKI). That is, the proof of non-zero value might serve a certificate authorizing use of the unchecked division code.

Would this "non-zero integer" then be a capability?

No.

Why is "authority" limited to side-effects?

Because communication effects (aka 'side effects', but I prefer the connotation of these effects being more purposeful) are the only phenomenon in computation systems that you can meaningfully restrict. Turing equivalence tells us that we all have authority to compute the same values (and run the same simulations of file systems). But we don't have equal authority to access information or manipulate it.

('Information' is distinct from 'value' in that information has a history and a future, a source and a destination, time and space - a relationship with the world. A value is just a platonic existence. There is a similar difference between a random number and an arbitrary number.)

you have commented before that access to reflection should be controlled via capabilities, and yet reflection info is immutable, and does not involve any side-effects. How do you reconcile this?

My reasoning is roughly: communication effects always bring untrusted and trusted code together in the same system. (This isn't a mathematical law, but Greenspun's is right up there with Parkinson's and Murphy's laws. A useful system will have code-distribution. Period.) In any rich system, code will tend to mix in a rather ad-hoc fashion. Your untrusted code should not have authority for arbitrary reflection on references to my objects, and vice versa.

Ultimately, this argument depends on communication effects and dynamism in an open system.

If we had small, closed systems, a reflective capability isn't significantly different from reflection as an ambient authority. But I have another argument that (in short) goes: at scale, all systems are essentially open, because there are many components and many developers (and code a few years old might as well have been written by a stranger) and nobody has a clear view of the whole system.

The developer cannot view any function signature and verify, "this function uses capabilities, and this other function does not". In my view, capabilities are everywhere, and the developer can say, "this function uses these *types* of capabilities, while this other function does not".

In contrast: in my view, a pure function application never uses any capabilities. And if we start talking about *types* of capabilities, we're probably talking about 'effect typing' (such as idempotent caps or transactional caps) and/or 'substructural typing' (such as linear or single-use caps). [And since Matt felt need to clarify it, I'll do so as well: capabilities aren't required to cause effects, but the potential is always there.]

Just as you see my view as overly restrictive, I see yours as overly general.

capability security is purely a restriction on the initial conditions of a pure LC program, and the available operations are the "authorities"

That little summary would also include information-flow security, type-safety, proof-carrying code, certificate authority, and so on - none of which involve references that designate authority to interact with the object referenced.

just sit in a wait loop until the remote ref resolves

This would not result in an equivalent program.

The only difference are the probabilities of failure here, and thus the rate of progress.

It's a very significant difference, both in probability and consequence.

The probability of partial failure due to disruption or administration jumps from 'nearly zero' to 'nearly inevitable' by the time you reach about ten computers or a few hundred meters of distribution.

you consider a remote ref to an immutable value to be a capability because you consider it to induce effects, while you consider the local ref to not be a capability (from prior statements). Where did I misstep?

You misstep somewhere in the "from prior statements" leap.

From my prior statements, the conclusion (about my position) should have been: "not all values are references" and "values used in a simple lambda application are never references". Local references can be capabilities - that would not contradict with this argument.

Since values are not references, your argument about 'references' in lambda-calculus leads to my impression that you are confusing implementation details with semantics. It is an implementation detail that values may be stored or implemented using references under-the-hood. But the semantics are still value-passing - e.g. we don't need to dereference to interact with the value. (If we did need to dereference a value to interact with it, we would have infinite regression. Thus, a distinction between references and values is essential.)

Ultimately, I understand your entire argument about local value refs vs. remote value refs to be a question about implementation for LC's value passing semantics. We may have been arguing past one another due to this. My conclusion, however, should be fairly clear: Local references make a decent substrate for implementing value semantics. Remote references make an awful platform for implementing value semantics.

There is no such thing as

There is no such thing as "one" level of indirection in an object capability system.

That's not what I meant. You said that if accessing a local value required a dereferencing operation, then it's a capability. This is merely adding a level of indirection to values, and it seems bizarre to use this as the defining characteristic of a capability.

We can compose operations on numbers without any need for linearization. Issues of indeterminism and concurrency heavily interfere with modeling file systems and operating systems as easily as numbers.

An LC implementation of file systems with a pure LC implementation of monads is certainly possible. I'm not sure why you'd think otherwise. LC is Turing complete, so of course it can implement all the semantics necessary. There is no wrapping or threading of external values, so did the capabilities suddenly disappear?

But we don't have equal authority to access information or manipulate it.

I agree, so access to information or ability to manipulate certain kinds of information is authority. And all authority should be conveyed by capabilities in a capability-safe language. I'm sure you've seen my latest reply on cap-talk by now, so you know how I think this connects directly with integers and the running example I've been using.

That little summary would also include information-flow security, type-safety, proof-carrying code, certificate authority, and so on

Capability security spells out the safety conditions, while you are describing the mechanisms to verify said conditions, so they're not at odds.

This would not result in an equivalent program.

Since this feels important, I'm going to focus on this only for now: how do you define equivalency between these two programs such that they are not equivalent? Aside from different temporal semantics, they look the same to me. Every value needed for progress is migrated on dereference, so it's a distributed memory of sorts, instead of it all being local. I wouldn't design a program like this, but it works for illustrative purposes.

If our level of abstraction is a cluster with distributed memory as above, do you consider communications happening within the cluster to be mediated by capabilities, or only communications leaving the cluster?

if accessing a local value

if accessing a local value required a dereferencing operation, then it's a capability. This is merely adding a level of indirection to values, and it seems bizarre to use this as the defining characteristic of a capability.

Wrong. 3 is a value. Merely a level of indirection - while remaining entirely within the 'values' abstraction - would look like Identity 3 or (3,) or {foo:3}. This sort of indirection can be useful, but is fundamentally different than working with references.

Asking a reference for its value requires a level of communication - that is, introducing an entirely new, impure, non-value semantic concept. Logically, you:

  1. Send a request asking the referenced entity to return its held value.
  2. Include a return reference in the query (some languages make this implicit).
  3. Wait until a response arrives.

It is this layer of communication that allows interlocution and all the various various capability patterns (attenuation, revocable forwarding, logging, et cetera).

And all this is simply a defining characteristic of references: names that are distinct from the entity so named. Capabilities are clearly defined as references. It is not at all 'bizarre' that capabilities would have the characteristics of references.

An LC implementation of file systems with a pure LC implementation of monads is certainly possible. I'm not sure why you'd think otherwise.

I didn't say it was impossible, Sandro. Turing equivalence means you can 'simulate' file systems in almost any language. What you said, though, is that it is "just as easy" as modeling numbers. And that assertion is what I deny.

Besides, statements in a monad are no more 'pure functions' than would be a block of C code. Monads allow you to capture the entire block and execute it in an artificial environment, but the abstraction seen by the monad's user is impure. A statement made twice, even with the same arguments, can return two different values. Ordering of statements within the monad make a difference in behavior. Clearly, the characteristics of pure functions failed to survive this transition.

It is, of course, only natural that if we successfully model impure systems, we'll sacrifice the declarative benefits of purity once we're working inside that model. If you want to maintain a more 'pure' approach to modeling a filesystem or OS, there are several alternatives. Introducing temporal semantics (e.g. with functional reactive programming) is my favorite option.

Anyhow, numbers are relatively easy to model because you don't face the modularity and composition issues of monads. You can, with enough effort, separately develop a filesystem monad and an operating system monad... but a day later, you'll need to compose them, even make the two interact.

There is no wrapping or threading of external values, so did the capabilities suddenly disappear?

Monadic side-effects still count as side-effects. They are still 'out of band' with respect to the statement behaviors. However, whether they are "capabilities" requires further qualification. There are many ways to implement filesystem and operating system APIs without using capabilities. (It doesn't make much sense to ask whether capabilities 'disappeared' until you had them in the first place.)

Capability security spells out the safety conditions, while you are describing the mechanisms to verify said conditions, so they're not at odds.

"Capability" spells out the mechanism. The conditions fall under the purview of "security" in general. Capability security says that we achieve the conditions of security by use of capabilities (i.e. by designating authority with unforgeable references, then mediating access to these references).

The other security mechanisms aren't necessarily 'at odds' with capability security. (Some of them are even complementary.) But they are certainly distinct from capability security - have different characteristics: different patterns and idioms, different administrative overheads, different scalability properties.

Since this feels important, I'm going to focus on this only for now: how do you define equivalency between these two programs such that they are not equivalent?

By the set of possible outcomes.

You suggested a transform that reduces any program from scenario (2) to the happy path or ⊥. But this transform does not result in a program equivalent to the one pre-transform.

It would be absurd to claim an NDTM is equivalent to a TM just because one of the NDTM's execution paths or termination states is the same. (One could prove all TMs equivalent if that were the case, by writing a single NDTM that covers every possible execution path.) The same principle applies here: in (2) you express an NDTM because every dereference can return OR fail.

If our level of abstraction is a cluster with distributed memory as above, do you consider communications happening within the cluster to be mediated by capabilities, or only communications leaving the cluster?

Communications don't need to "leave the cluster" to count. Capabilities can be used to mediate communication even within a closed system.

As I said earlier, what matters is the abstraction that developers are working with. Working with references is a very different abstraction than working with values.

Asking a reference for its

Asking a reference for its value requires a level of communication - that is, introducing an entirely new, impure, non-value semantic concept.

This is not the example I introduced at all, so I'll reiterate: I have a local integer, and a remote integer. My language abstracts over location so I don't care about a variable's location when I execute a+b. You've stated that the local integer is not a capability. You've said remote refs like this are horrible for value semantics. Perhaps so, but is it a capability? Either they both are, or they both are not.

Anyhow, numbers are relatively easy to model because you don't face the modularity and composition issues of monads. You can, with enough effort, separately develop a filesystem monad and an operating system monad... but a day later, you'll need to compose them, even make the two interact.

I think this misses the point. I agree there are well-known issues with composing monads. What my example illustrates is that starting with an entirely "pure" system with no connections to the outside world, we can mimic a traditional system with I/O. Does the simulation use capabilities or does it not? If so, how did we move from purity/value semantics to impurity/capabilities?

As another example which I posted to cap-talk, consider a purely functional OS which checkpoints the entire system state during GC. This system causes no explicit effects on its own with I/O handles and such, yet I/O happens. Are capabilities in use here?

Capability security says that we achieve the conditions of security by use of capabilities (i.e. by designating authority with unforgeable references, then mediating access to these references).

Only in systems that are not the LC, since the capability properties are already present there. Since we're arguing about the LC specifically, what I said stands: capability security is only a constraint on environment extensions, and the other systems you mentioned are mechanisms to enforce such policies (though they can enforce other policies of course).

You suggested a transform that reduces any program from scenario (2) to the happy path or ⊥. [...] It would be absurd to claim an NDTM is equivalent to a TM just because one of the NDTM's execution paths or termination states is the same. [...] you express an NDTM because every dereference can return OR fail.

As I argued at the start of that thread, ordinary references have the same behaviour (random bit flips, heat damage, etc.), just with different probabilities. The monadic encoding and the explicit dereferencing examples simply encapsulate and expose these semantics. Every path is the same in both programs, it's just that a program using remote references will take longer due to communication delays and congestion triggering wait loops.

Communications don't need to "leave the cluster" to count. Capabilities can be used to mediate communication even within a closed system.

I'm not denying that. I'm asking whether a language which abstracts over the location of values and transparently distributes programs across a cluster uses capabilities during its "pure computations", ie. adding integers. Or does only communication that leaves the cluster count as using capabilities at this level? At the level of abstraction for this language, given your previous arguments about value semantics, you should conclude that within the cluster, this language does not use capabilities at the language level (though it may at the runtime level).

E-mail

I'll take further response to e-mail.

What is a "capability" in an "object-capability system"?

Same answer as I just gave at http://www.eros-os.org/pipermail/cap-talk/2011-February/014474.html

Here I'm just going to answer the definitional question this thread started with and address a few of the resulting concerns. Since my thesis seems to be the consensus defining document of the "object-capability model", I claim only to be speaking for what "capability" means in the object-capability model. After 45 years, it is not surprising that "capability" means different things within different models, much as "point" means different things within different geometric systems. Since I claim that DVH is an object-capability system (with a bug accidentally fixed by omission in its first implementation as the PDP-1 Supervisor), I will further claim that my definition is consistent with their usage of the term. Other definitions may also be consistent with DVH's usage -- this need not concern us here.

Sandro asks about the differences between my use of "capabilities" and the use in "Lightweight Static Capabilities". This is an awesome paper that I've learned a lot from. In introducing "capabilities", it cites

Walker, David, Karl Crary, and Greg Morrisett. 2000. Typed memory manage-
ment via static capabilities. ACM Transactions on Programming Languages and
Systems 22(4):701-771.

for "static capabilities" and the Ode for "capability". However, the Ode was written before we coined the term "object-capability" and stated the model precisely. So it shouldn't surprise us if the definitions differ. I do not argue that the notion in that paper is not (or even less) useful, even within object-capability systems. But it differs subtly from the meaning of "capability" in the object-capability model.

In my thesis, I do indeed define a capability as a "reference to non-data". I also define "reference" as

A reference provides access to an ob ject, indivisibly combining designation of the
ob ject, the permission to access it, and the means to access it. Our access diagrams
become Granovetter diagrams, where the permission arrows now depict references.

In this thread I've seen arguments that all references should be considered capabilities. But we use words to make distinctions. We have many more distinctions we need to make than we have good words available to name them. Often, we do indeed want to do security reasoning about references rather than capabilities, for example, to reason (however soundly) about confidentiality. Fine, just call those "references" rather than "capabilities". Both concepts are valuable, and the distinction is valuable.

The key to the difference of course lies in the definition of "data":
Data ob jects, such as the number 7. Access to these are knowledge limited rather
than permission limited. If Alice can figure out which integer she wants, whether 7 or
your private key, she can have it. Data provides only information, not access.

I think most in this thread have missed the significance of that. When we reason about capabilities by themselves, we are able to derive security some conclusions without concern about whether adversaries are able to gain knowledge that we might like to deny them. When Q and Bond are connected only by a bit channel, whether overt or covert, then Bond can come to know anything that Q knows, if Q tells Bond, and therefore Bond can come to hold any data that Q holds. But Q cannot transfer to Bob even a reference to an immutable object with unforgeable identity -- a Joe-E token -- because there's no way to transmit an unforgeable identity over a bit channel. That's the crucial difference between unforgeability and unguessability.

Sometimes, as in cryptography, we must reason about limits of knowledge. I agree that we cannot do so soundly, but that does not remove the need, so we proceed to reason unsoundly with engineering heuristics. Reasoning about "references" and overt data paths in object-capability systems helps. But doing engineering in the absence of sound reasoning is always hard and the results will not be as reliable as we like to think. Absent sound reasoning, engineering does have other tools at its disposal: post public challenges with prizes for finding high bandwidth covert and side channels in a given system. As time goes by with the prize unclaimed, escalate the prize money offered, and lower the bandwidth needed to claim it. If you should achieve a long-unclaimed large prize for a low bandwidth, and crucially, if this prize also becomes high prestige so quality effort is really being spent trying, then you have something. It's just hard to state precisely what.

The crucial thing is to partition the system into
* those portions -- hopefully the vast majority -- for which we only need to reason about things we can reason about soundly, and
* those portions where we must engage in unsound reasoning to defend the security we must care about.

Distributed object-capability systems, where we have object-capability islands communicating over a cryptographic capability sea, as in E and Waterken, have exactly this separation http://www.erights.org/elib/capability/dist-confine.html .

Maybe a tangent...

I'm really not sure how this is related to the central argument here, but I admit to having grave doubts about what "purely functional actor" could even mean. Certainly from a semantic point of view, it seems to me like a contradiction in terms. Actors are fundamentally based on "send" and "receive", are they not? We could hand-wave over "send" and claim that it looks like a pure function to the actor performing the operation, but I don't see how to do that with "receive". The receive operation is just fundamentally impure and referentially opaque.

We could restrict actors in some way so that, for example, the "actor" is simply a single function which is called once for each incoming message (and perhaps returns the next such function to call on the subsequent message). But it's not clear to me that this is really the same thing, and in any case it seems like a perverse redefinition. At that point we're just programming to an event loop.

So I guess, it seems to me that as soon as you talk about actors, you're already in an effectful world. When people talk about purely functional actors, I confess, I get confused.

"purely functional actor"

Actors are fundamentally based on "send" and "receive", are they not? [...] The receive operation is just fundamentally impure and referentially opaque.

An actor, at least as originally defined, will atomically process each message. There is no 'receive' operation that would allow the actor to wait on a new message while processing the old one. (At least, not in general, though Erlang and some actors-model inspired languages have such a feature.) You cannot send a message then wait on a response (except by using stateful serializer patterns). You can send a message along with a reference to a 'continuation' actor that will be sent the response.

In a stateless actor, receiving a message doesn't look a whole lot different from a function receiving an argument.

Of course, the behavior is a lot different from that of mathematical functions - you cannot eliminate duplicate expressions, for example, and composition of results from two messages is difficult without going impure... or using monad-like sequencing of continuations.

In any case, this seems close to the restriction you are saying: a stateless actor is basically a function that returns a list of 'send' operations instead of a return value.

It seems to me that as soon as you talk about actors, you're already in an effectful world. When people talk about purely functional actors, I confess, I get confused.

I agree it would have been clearer to call them 'stateless' actors, but I've heard them called 'purely functional' before.

Eureka

That makes perfect sense. I didn't know that actors originally had that restriction.

Abstracting Glue

Another recent article explains monads as abstracting language glue.

As Derek notes, monads aren't about state or IO precisely. Gilad acknowledged this too - he asserts near the top of 'Maybe Monads Might Not Matter' that he's discussing state monads in particular (and not 'maybe monads'). (Note: Sean, you'll probably want to adjust your OP link to target the specific article, and not Bracha's whole blog. Also, your comments link is to the wrong post.)

I would say that the most interesting way to use monads is as a monad transform or monad typeclass. A monad transform gives you 'layers' of implementation, and the typeclass gives you abstract implementation.

Actors model can offer similar features with regards to 'layers' of implementation. In particular: dependency injection serves in place of monad typeclasses, and delegation chains (or mixins) serve in place of monad transform layering. Using various object capability patterns, you can easily create a confined 'sandboxes' inside a larger program.

Of course, both models are equally 'powerful' in the sense of Turing equivalence. But the argument that you can model actors with monads or vice versa is... much easier said than done. Getting sequencing right with actors requires pervasive discipline that is unlikely without special language support. Expressing actors model inside monads is non-trivial due to various issues of message routing, garbage collection; one will probably need some support from the implementation that isn't implied by monads (such as ST vars).

So, ignoring the question of 'power', what are the other major differences?

Monad abstractions (type-classes, transforms, etc.) must pipe all interaction and state maintenance through a common point in their implementation - this gives a lot of power for cross-cutting abstraction and features. For example, you can effectively implement 'backtracking' monads, multi-pass monads, continuation capturing, exception-handling patterns - various things that would be rather difficult to do with actors (without considerable discipline).

OTOH, this single-point-of-interaction is also a barrier to pluggable modularity and scalability (parallelism, distribution).

Once you start including special implementation features, or using the IO monad directly (instead of monadic typeclasses and transforms), however, the features blur a lot - the difference between actors and monads will be much less distinct.

Done, thanks!

Done, thanks!

Monads vs Actors is kind of silly

Honestly I'm no even sure what point the original article is trying to make. As to your points:

1. What kind of "composability" are you talking about? I for one am not much a fan of monad transformers, in part because they help create an illusion that monads are more composable than they really are. Taking a monad and its effect basis and combining it with another monad and effect basis is in general a difficult problem, and in some cases may even be impossible. Take the lazy state monad and continuations, the ST monad and explicit concurrency, or the less-than-desirable interactions between lazy IO and mutex locks, for example.

2. Why hide the "monadic heritage" if your interface is good? Maybe monads are too hard for "average programmers", but recursion was also considered to be a difficult topic once upon a time. Where beginners often go wrong is that they try to understand what monads "are", instead of focusing on using one or two specific monads. (Such as IO or ST or Linq)

3. Monads are about much more than state, and even the state monad has some surprising properties in the presence of lazy evaluation. For an example of something completely different, LolliMon uses a monadic interface to erect a permeable barrier between forward-chaining logic and backward-chaining logic.

I'm not entirely sure how I feel about implementing applications in terms of their own application-specific monad mostly to manage state; such as taken by the Snap Framework and it's new extension hooks. But the whole "monads vs actors" debate seems rather silly to me. They are orthogonal concepts; monads are about a kind of interface, whereas actors are about a kind of concurrent semantics.

I don't...

...care that much about monads. I mean, yeah, they're nice, but there are lots of other nice categorical constructions too.

I thought this bit was funny, though:

What is important to me is that the notion of actors is intuitive (a pesky property of Dijkstra’s hated anthropomorphisms, like self) for many people. Yes, there are many varieties of actors and I have my preferences - but I’ll take any one of them over a sheaf of categories.

Actor-style programming is concurrent, higher-order, imperative programming in continuation-passing style. This is REALLY HARD -- as any attempt at trying to prove the correctness of behavior-preserving refactorings of GUI programs will demonstrate. The only method I know of that can manage it are, in fact, sheaf categories. Kripke-style models are the only reasoning tool I know of which let us say reasonably precisely what higher-order imperative programs should do.

Semanticists spent the 80s and 90s building very sophisticated semantic models of rather impoverished programming languages -- but no one is really that interested in an impoverished programming language. However, once you notice that "impoverished language" is another word for "DSL", then these methods become much valuable as techniques for program proof rather than denotational semantics per se.

(EDIT: I should add that my laughter was not critical -- just rueful.)

Denotational Semantics for DSLs

Can you please elaborate on what you meant by the following?

However, once you notice that "impoverished language" is another word for "DSL", then these methods become much valuable as techniques for program proof rather than denotational semantics per se.

Perhaps even examples or references...

Hi Ohad...

...take a look at this recent draft I wrote with Nick Benton on higher-order functional reactive programming. You'll find metric spaces and step-indexed models, both reasonably sophisticated semantic techniques, but here they are used to prove the correctness of a library. As an example of weakness becoming strength, our model is of a total language, which is obviously not general-purpose. But for our library it serves as the guarantee that clients can't accidentally define ill-founded feedback.

Another example is the Geometry of Interaction construction. Lurking within it is a bidirectional model of dataflow programming. I can't say much more about it, since I haven't tried too hard to program it up, but I would bet that the bidirectionality in it can be used to handle many patterns of imperative programming with aliasing.

Something more impressive

Something more impressive might be:


Here is a system we can describe using actors.

But, behold, we can simplify the system using a much simpler to understand system! Here is the category theory that explains it.

I am not familiar with such tutorials.

Thanks for the reference!

I see now. That's pretty cool!

I don't agree that the weakness is turned into a strength though. The weakness is still there: denotational semantics doesn't have a full account good enough to encompass everything that is out there. But people have done enough work so that simple things have a well-thought, well-visited denotational semantics. So at this point we can tackle DSLs.

Can you say what obstacles the denotational approach lets you overcome more easily than an operational approach would? Would the same apply to richer languages, if our denotational machinery advances enough?

Monads are unsuitable for concurrency

Some researchers put forth the thesis that monads could help with concurrency. At this point, it looks like their thesis has failed.

The Actor model (e.g. see ActorScript(TM) extension of C sharp (TM), Java(TM), and Objective C(TM)) is a better foundation for concurrency than monads.

Concurrency was not relevant

Concurrency was not relevant to Gilad's argument.

In the age of many cores, concurrency is *always* relevant

In his blog, Gilad posted "Concurrency is never easy. Asynchrony, however, is very common in the real world and I think people can unlearn bad habits and use it successfully."

Not relevant to Gilad's argument

Did you actually read the argument? or did you just see the word 'Actors' and assume?

The comment on concurrency was a comment responding to someone else's concerns about concurrency. But it still wasn't relevant to the argument presented in the main body of the post.

In the age of many cores, concurrency is *always* relevant

Concurrency certainly is a cross-domain subject. Performance is a cross-domain concern and its characteristics include efficiency, utilization, scalability, and latency - and choice of concurrency model certainly impacts those properties. Coordination and integration of multiple systems is another cross-domain issue where concurrency is relevant.

But not all arguments are about performance or coordination. The OP topic was not. It was about expressing effects.

Initiating change is inherently about concurrency

It is solipsism to imagine that effects in computation are expressed only to oneself (the one initiating change). Inevitably, change can affect others thereby throwing us into concurrency.

Effect Affect

Your statement is true, I grant, but still not relevant. The issue of how your effects affect others was not the subject of Gilad's argument.

And, relevantly, how your effects affect others isn't even a valid point of distinction between effects expressed monadically vs. effects expressed in actors. That is, monads don't prevent concurrency semantics - e.g. that a 'queue' is loaded by other (possibly monadic) agents.

Monads in the lambda calculus can't change anything

Strictly speaking, monads in the lambda calculus can't change anything.

Strictly speaking, neither

Strictly speaking, neither can a closed system of actors.

Monadic effects modify or observe the environment from one step to another. A change is observable to others in the same environment. These same properties hold for actors, though the modularization is different.

Even in a closed system, Actors can change

Even in a closed system, Actors can change. In the lambda calculus, nothing changes.

Ridiculous

Actors are not "privileged" when it comes to closed-system evaluation.

From an outside perspective, nothing 'changes' in a closed system regardless of the effect model - whether it be a state and concurrency monad, imperative, or actors. From a perspective inside the system, changes may be visible. But that, again, is true without regard to the effect model.

Viewed from outside, the behavior of Actors can change

Viewed from outside, the behavior of Actors can change. However, viewed from outside, the behavior never changes for objects implemented in the lambda calculus.

abstraction violation

You can judge two kinds of properties when studying any abstract model: first, what properties the model will possess if implemented correctly; second, whether a correct implementation (one that doesn't violate any axioms) will feasibly be robust, efficient, scalable, et cetera. But it is a mistake to confuse the implementation with the model being implemented. It is a mistake to confuse the second kind of property with the first.

We could implement actors model - complete with state and concurrency semantics - within the lambda calculus in a number of different ways (most likely atop some other abstraction, such as temporal semantics/FRP, or concurrency and state monads). The implementation could ensure 'all the axioms' of actors model are obeyed. This does not mean that actors inherit the properties of lambda calculus, nor does it mean that lambdas have the properties of actors model.

You seem to be equating 'monads' to 'lambda-calculus' just because the former are sometimes implemented in the latter. Don't.

Actors cannot be implemented in the lambda calculus

Actors cannot be implemented in the lambda calculus.

See What is computation? Actor Model versus Turing's Model for an informal account.

And see Common sense for inconsistency robust information integration using Direct Logic(TM) and the Actor Model for a formal published paper.

Also there is a video available at How to Program the Many Cores for Inconsistency Robustness with slides available here

Have you written any

Have you written any articles on search engine optimization?

Actors can be implemented in lambda calculus

Even if we assume your position on the relative 'computational power' of Actors (which I find extremely dubious), you have asserted: "What is an Actor? Answer: Anything that obeys the axioms!" This gives us a lot of flexibility for implementing actors model. We can faithfully model the message-passing concurrency (e.g. using branching time, oracles, or a concurrency monad), we can faithfully model your 'fairness' axiom (e.g. using a diagonalization on the concurrency model), and we are even free to choose a deterministic execution path (since every fair message arrival-ordering will "obey the axioms" of the actors model). The major trouble LC would give is for open modularity, which is a non-issue for the closed-system case.

So your words are, frankly, wrong and inconsistent when you claim we cannot implement actors model with lambda calculus. You could make a lesser claim: that the implementation of the Actors model on a Turing machine will not have the same freedom (same range of outputs) as Actors model. This a claim is still extremely dubious, especially since the number of undelivered messages is bounded at any given instant, but would at least be consistent with your views on Actors computation.

Also, since you've already posted those articles on LtU, you should link back to the corresponding LtU articles instead of directly to your site.

Theorem of the Actor Model: Lambda calculus isn't universal

There is no "fairness" in the Actor Model. However, there is an axiom that every message sent will be received. There is another axiom that messages can be received in any order that is compatible with the laws for Actors.

Computation in the Actor Model is characterized by the Computational Representation Theorem. See Actor Model of Computation

There is no "fairness" in

There is no "fairness" in the Actor Model. However, there is an axiom that every message sent will be received.

That would be what I meant by a fairness axiom.

There is another axiom that messages can be received in any order that is compatible with the laws for Actors.

Indeed. And, from this axiom, you can conclude that even a deterministic ordering is okay for an implementation, so long as the ordering is compatible with the laws for actors. If one wishes to model all possible message arrival orders, that goes above and beyond what an 'implementation' requires.

Theorem of the Actor Model: Lambda calculus isn't universal

I believe that Thomas Lord totally destroyed your argument in an earlier thread.

A NDTM cannot implement unbounded nondeterminism

Plotkin proved that a Nondeterministic Turing Machine cannot implement unbounded nondeterminism. See What is computation? Actor Model versus Turing's Model.

Thomas Lord was having difficulty understanding this in the earlier thread.

so-called unbounded nondeterminism

You agree that an NDTM can be understood as describing a tree of achievable configurations from some starting configuration, right?

To prove that an NDTM can diverge, you must prove the existence of a configuration within that tree with the property that there does not exist any descendant configuration which is in a final state.

Take the simple "counting" NDTM we discussed earlier, comparing it to the "counting" Actors program. The configuration tree of the NDTM has unbounded depth and finite branching factor. There does not exist anywhere in that tree a node (a configuration) with no descendants in the halt state. That NDTM does not diverge. There is no bound on how high it can count. It exhibits unbounded nondeterminism, if you want to put it that way.

foundations

I was looking at Actor Model of Computation . An example of the foundational problem can be found in that paper. The claim is made:

1. There is a bound on the size of integer that can be computed by an always halting nondeterministic Turing Machine starting on a blank tape.

2. By contrast, there are always-halting Actor systems with no inputs that can compute an integer of unbounded size.

Nowhere is the term always halting nondeterministic Turing Machine given a precise definition.

In the classical formalisms I know of, if an NDTM is structured so that there is an upper bound on the number of steps in any run of the NDTM, we note that the machine belongs to a sub-class of NDTMs: non-deterministic finite automata. It is well known that a finite automata can not generate an arbitrarily long integer.

In the classical formalisms I know of, if the NDTM is structured so that it can reach some non-final configuration from which no final configuration can be reached then we say that the NDTM is divergent - that it, at least in part, denotes ⊥.

A simple counting NDTM has neither of those properties. There is no upper bound on how long it can run (just like the counting Actor). It does not denote ⊥ (just like the actor). The machine is "total". Just like the Actor machine.

The paper also tries to make a distinction between "enumeration" and "implementation" and distinguishes the Actor model as being especially based in physics. There is a problem there.

Implicit in a proof such as that an actor program can count to any possible number while still being guaranteed to halt is the assumption of a physically realized arbiter which can delay delivery of a message indefinitely, yet in some sense be guaranteed to deliver it. The math there is formally unconstructive and the physical intution is unconstructive. I can't imagine how to build an arbiter with the desired properties. I can't really do any computation with a supposed integer that exists but which can not be deduced.

What has really happened there, it seems to me, is that an uncomputable oracle has been snuck into the Actor axioms. The Actor counting program isn't really counting, at all - it is reading input from a physically unrealistic oracle. It is the oracle, not the actor, that is generating an arbitrary number with no upper bound.

Back before I heard of any of this "NDTMs aren't universal" nonsense, my original understanding was that Actors are a cool formalism not because they were somehow a more complete model of computation than TMs or lambda calculus or any of the others -- but because the Actor formalism can be particularly parsimonious in some situations. I wish the contemporary emphasis were more about that.

Simple Actor system that computes an unbounded integer

There is a simple Actor system that computes an unbounded integer in
Unbounded nondeterminism in ActorScript(TM).

It doe not make use of an oracle. And it is doing the counting itself.

re: Simple Actor system that computes an unbounded integer

There is a simple Actor system that computes an unbounded integer in Unbounded nondeterminism in ActorScript(TM).

It doe not make use of an oracle. And it is doing the counting itself.

If I physically manifest that Actor program, what determines which integer is returned? You yourself have described the answer as a "non-public computation" - the arbiters that decide message delivery order. It is certainly non-public but it is not a computation for you have specified no way to compute it. You have merely axiomitized it into existence. It is an oracle. In this case, it is an oracle that returns some non-negative integer, without further constraint. This is a lot like postulating a turing tape initialized to an arbitrary integer, and then have a regular expression recognize it. Your mathematical account is weak. It does not mean what you think it means. I don't think that this detracts from the value of Actors formalism but it seems to be important to you to think otherwise.

I give up. I have not found a way to engage you in a way that indicates you are actually responding to what I and others are saying on this topic. As I said a few posts back: sorry. I set out to pick some lint on what you were saying (and/or how you said it) but apparently I can not.

Computational Representation Theorem

The Computational Representation Theorem (see above) says that the program returns an integer of unbounded size.

yes it does.

.

The big picture

Good! Try to put together the parts that you understand to create a larger picture. Some of the missing pieces can be found in the published literature and videos.

Lacking answers

It is a shame that when Thomas asks you a direct question about the subject at hand that you simple avoid it.

You yourself have described the answer as a "non-public computation" - the arbiters that decide message delivery order. It is certainly non-public but it is not a computation for you have specified no way to compute it. You have merely axiomitized it into existence. It is an oracle.

What is a constructive definition of an arbiter?

This "arbiter" source of

This "arbiter" source of unbounded nodeterminism seems to be the nub of the matter alright. In its Heisenbergian unobservability, it's strongly reminiscent of quantum magic postulated by Penrose for his hypothesis of the non-computability of human intelligence.

What Hewitt seems to be suggesting is that if only we could make rolling a die of some kind a part of our Turing machine, it would suddenly become a more powerful computational model. But if we have a deterministic source of pseudo-randomness that is statistically indistinguishable from the output of a physical circuit, how would that be any different? More importantly, how do we know that our arbiter is really not actually deterministic inside its oracular box?

(My practical, rather than theoretical, orientation is showing, I'm sure.)

Arbiters are one possible source of unbounded nondeterminism

Arbiters are one possible source of unbounded nondeterminism. But there are others. For example, a mail server might crash and it could take an unbounded time to recover the disk contents and restore it to operation.

Arbiters can be constructed so they are physically symmetrical and then strongly isolated from the environment. So it is hard to imagine a source of bias.

It would be nice to have a derivation of arbiter indeterminacy from the quantum physics of integrated circuits.

I've spent a bunch more time

I've spent a bunch more time reading, and I understand better what you're getting at now. It still seems a very fine distinction from a possibly non-halting non-deterministic Turing machine, but I don't have the inclination for the rigour necessary to see the difference it makes, that I can see.

I'll butt out now...

Don't be discouraged

Don't be discouraged. These distinctions are made for complicated theoretical and practical reasons. My colleague Edsger Dijkstra (who was brilliant!) ruled out unbounded nondeterminism. See Dijkstra believed that unbounded nondeterminism is impossible to implement.

Example doesn't follow

Your example of a mail server doesn't work for two reasons:
1. There is no guarantee that the recovery process will terminate at all.
2. If the mail server is not interacting with the network (ie looking at the local disk for recovery) then it can be modelled easily as a Turing Machine. Perhaps you thought otherwise because of point 1?

The mail must go through!

In practice, redundant mail servers and other measures will be taken to make sure that the mail gets through.

In practice

those measures are not perfect. As those systems do not provide an absolute guarantee of delivery they are not an example of an arbiter. In practice those machine have finite sized buffers and drop messages in the worst case.

Can you provide a constructive example of an arbiter that performs this kind of guaranteed message delivery?

Arbiter

What is a constructive definition of an arbiter?

Maybe naive, but the "it's guaranteed to arrive but there's no bound on when" sounds a lot like an axiom of choice over an infinite domain (with what cardinality?). If so, that means there can be no constructive formalization of it. We have to assume it or something similar as an axiom bolted on to whatever logics we might normally use for computation. And that produces a nonconstructive logic that can prove things that can't be proven in the constructive realm of LC and Turing Machines.

And if all that is so, the argument seems to fundamentally be about whether you can call such a nonconstructive logic "computation" and whether some aspect of the physical universe, e.g. some real computer system, is usefully modeled this way.

Not choice

I don't think this is related to the axiom of choice. It's "non-constructive" in the sense that we can't actually build it, but I don't think there's much deep here. To take Hewitt's example of a mail server going down and being brought back up, well, there's some chance it will not be able to be brought back up. In general, any nondeterministic mechanism we can build that can delay indefinitely can delay forever. It's just an idealization. As Tom correctly points out, there's nothing the actor formalism can prove that can't be proven about some corresponding NDTM.

Maybe not choice

I'm not sure it's choice either - it just vaguely seems like with the unbounded number calculation example we can model non-arbiter part as something that can compute an infinite list of bins with {0,1} in each one the arbiter part as a choice function that decides which bit to choose from each bin.

I didn't mean nonconstructive in the sense that we can't build it. We can't build a Turing machine with an arbitrarily large tape either. I meant nonconstructive in the logic sense.

As for the mail server, sure the data might be permanently lost. But is it in any sense useful to ignore that wrinkle and have a model that guarantees it will come back some day? Just like it's sometimes useful to pretend our computers can't run out of storage - e.g. any time we use big O notation we implicitly ignore the finite size of real boxen.

Shrug

Proving that there exist choice functions for countable sets of finite sets doesn't require the axiom of choice - the sets involved are small enough that you can construct the possible choice functions inductively. But I don't think it's even logically non-constructive in any important way that differs from finitely non-determinstic case.

I agree with your point that it might be useful to reason about eventual delivery, but that can be done using NDTMs as well.

Countable Choice not provable in ZF

If I recall set theory, even countable choice takes an axiom (over Zermelo-Frankel set theory). It is weaker than the full axiom of choice.
http://en.wikipedia.org/wiki/Axiom_of_countable_choice

Given a countable collection represented as a function g on
natural numbers, I don't see how to prove the existence of a choice function:

exists f . all n in N, f(n) in g(n)

from non-emptiness of the collection:

all n in N, exists x in g(n)

Perhaps you are thinking of constructive mathematics - there choice is valid for rather neat reasons. To give a constructive proof that a set of choices even exists in the first place, you have to have a procedure which produces for each choice a constructive proof that this set of choices is nonempty - which is just an element!

You missed a clause in my claim

You left off the important condition that all of the sets of choices are finite. That means you can inductively number all possible choices globally, and then for your choice function inductively choose the least numbered choice at each index. (This is rather informal, but I think that would work... set theory's been a while for me as well)

Not even pairs

Even assuming each set of choices has only two elements is not helpful. To make use of the isomorphisms with {0,1}, you would first have to make a countable choice of which of the two isomorphisms to use for each set of choices.

I think Matt M's idea was

I think Matt M's idea was not to pick, for each set, a bijection with [1..n] -- which, as you noted, only push the problem further of choosing the bijections --, but rather to pick an enumeration (x_n) on the union of all such sets, then associate to each set Xi the smallest n such as x_n is in Xi.

This shift the discussion towards the question of "is an union of finite sets enumerable ?". The slightly stronger case of union of countable sets is apparently outside ZF; see article, actually it shows that "countable union" implies countable choice of countables, but that the converse is not true. I would expect the finite case to be similar, for I do not see how to build an enumeration on the union without picking an enumeration on each element, but you need a lawyer^W set theorist here.

I always found those question to be too much detached from reality anyway. In nearly all cases where you would want to use such a theorem, you can make stronger, more constructive hypothesis such as "a family of finite set accompanied with a bijection into [1..n]", which solve the problem nicely. Mathematicians sometimes think they need the axiom of choice to state theorems in full generality and elegance ("for *all* sets ...."), but actually they're only doing early-binding to the general axiom of choice, which is a bad engineering practice. They could as well avoid the hard-coded dependency and claim that "for *all* sets with suitable choice condition ...", leaving to the user of the theorem the choice to invoke a magic general axiom or use a fine-grained constructive construction which applies in their case.

This is not really a problem anyway, because people working in domains where there are subtleties with AC are careful about it, and people who don't care work in domains where restating the theorem in a more economic form is usually very simple and entirely orthogonal to the interest and content of their work.

Thanks for the link. I

Thanks for the link. I didn't know that ZF couldn't prove CU. I see the problem with my reasoning - I don't have a way to pass from an arbitrarily long sequence of choices to an infinite sequence of choices without implicitly invoking the axiom of choice.

I still don't see the connection of the original context to the axiom of choice. In fact, James Iry's comment (to which I replied with these claims that have gotten me into trouble) mentioned choosing a value out of {0,1}. I wish to modify my argument for the existence of a choice function as follows: choose 0.

Connection

Knowing that every set is actually {0,1} is the sort of condition that makes it solvable.

I agree with gasche that this is exactly the sort of nonconstructive and subtle thing that should rarely be key. That's exactly why I'm skeptical of the fairness conditions that are suggested for Actors.

If you are making assertions about infinite behavior that do not follow from standard limits of the behavior of finite executions, and are only supported by an axiom that only applies in infinite cases, I don't see why you should bother with the claims or the axiom.

The specific connection to the axiom of choice is considering the example of the counting actor. There are possible prefixes of executions

_|_, S _|_, S Z, S S _|_, S S Z, ...

where S means the actor receives it's own "inc" message, and Z means it finally received the "halt" message, and _|_ means an "inc" and a "halt" message are still in flight. However, it's asserted there is no infinite execution.

Without side conditions to filter the set of infinite executions that may be considered this would violate Koenig's Lemma, so it seems there's no way to reason nicely about the infinite behavior of actor systems without explicitly considering fairness - or at the very least that such a system would let you refute the axiom of choice.

Similarly, it seems that the "computational representation theorem" doesn't actually imply the set of possible behaviors of an actor system is computable, because that \sqcup_i is taken in a nonstandard domain. A Turing machine that emits finite strings is said to emit also the set of possibly infinite strings which have arbitrarily long prefixes emitted eventually by the machine. If the strings are just a linearization of the events in a partial execution, then any Turing machine for the counter example does produce the infinite execution after all.

Will Clinger's thesis is fairly precise, but it seems to require that an actor only compute for a finite time before becoming ready to receive another message, which means a fair execution can simply be stated as one where all messages have been received. This is nice for things like pi-calculus, but I don't see how it works for the extensions of the actor model to standard programming languages, where the code in an actor might take an infinite loop. Perhaps you could affix an input queue to each actor, and let an actor "receive" a message by enqueuing it and becoming committed to a particular receipt order before it actually tries again to listen.

Anyway, I think this all stems from Hewitt being too insistent that it's actually possible for a particular actor system to produce unbounded numbers. I believe it's actually impossible for any implementation of actors to have that property, but the claim is unobjectionable in more constructive language:

The actor axioms prohibit any implementation of this program from running forever, but do not prohibit it from producing an arbitrarily large number. For any particular number, there is a system that obeys the actor axioms and might produce that number from an execution of the counter program.

These two claims are not incompatible with every particular implementation of actors having a finite upper bound on the possible results of the counter program.

After all, it's quite unusual to give a nondeterministic specification, and require an implementation to possibly produce every behavior allowed by the specification!

This connection to choice

This connection to choice seems like a stretch to me. If you assume that traces are made up of repeated finite non-deterministic branches (which is the setup I infer from your reference to Koenig), then maybe you can conclude that 'S S S ...' is a valid trace and get a contradiction to the actor axiom. But I don't see how Koenig's lemma or choice are required for that argument to go through.

RE: "computational representation theorem" Do you not agree that the of outputs of an actor system is recursively enumerable? Or do you agree, but just have an issue with the proof?

Enumeration

I doubt the set of all allowed event sequences is recursively enumerable. The limit in the theorem is taken in an incomplete domain, equivalent to taking the usual limit and then filtering out unacceptable strings. Neither acceptability nor unacceptability seems to be semi-decidable, so this filtering is suspect.

Why Actor Axioms Are Non-Constructive

Without loss of generality we can inductively say that we start with some Actor0 that receives a unique message Start and that associated with that Actor0 is no state other than finite ActorStepProgram which can be defined as naught but a particular and explicitly listed mapping between a few finite sets we can lay out. Given a (finite) set of Actors and Actor states and a choice of which messages are delivered with space-like separation in a given step --- the ActorStepProgram determines the next state of the Actor network including the new set of messages "in flight".

That's a sketch of how Actors were defined in the first place. That's where Hewitt started. I admire the nice idea of using a simple axiomatic description of a causality lattice modeling communication among space-like-separated step-performing active nodes. I like Lamport's version of it for a narrow purpose. I like Hewitt's for a broad purpose. But...

The details of the ActorStepProgram induce a causality relationship among message transmission and receipt events - a partial order. Hewitt conjectured that a pair of such events, E1 and E2 are either causally unrelated or are separated in a chain of causality by no more than a finite number of other causally related events.

Clinger in some sense proved that that Hewitt's conjecture was independent of Hewitt's axioms. For example, it would be consistent to create a theory in which we have an axiom that says an E1 separated from an E2 by an infinite chain of causality exists. It would also be consistent to add an axiom which says that no such infinite chain of causality exists.

The latter possibility turns Hewitt's unprovable conjecture into an axiom, so they went with that. Swell.

In my view of constructive math, we left the rails already with Hewitt's initial conjecture. Hewitt's conjecture seeks to deny the existence of certain traces that only "exist" if we add something missing from the original Actor mechanisms: an axiom of transfinite induction. The distinction his original conjecture seeks to make is formally senseless if we take his original axioms literally.

Clinger (as I understand history) informally bought into the threat of this transfinite induction monster and formally, I think, made a muddle. Evidently, now, we have an axiom of transfinite induction - and so the counting Actor can compute omega (and then quit or keep going, I guess, depending on how you frame it). And we need this new axiom to exclude those monsters. So, right away, whatever theory we're allegedly talking about has some underspecified and non-constructive law of things that exist by transfinite induction, from which the new axiom is going to subtract things out.

I'm not claiming here that that finiteness axiom makes Actor theory inconsistent. I am claiming that its an axiom based in an apparently non-constructive view but kind of muddled view. It could very well lead to inconsistency depending on how you make it precise.

In any event, we don't need that axiom. It doesn't make Actors "universal" in some way that Turing Machine's or Lambda etc. are not. The hype around that claim is what brought me into this discussion in the first place, way back when. Just don't assume any existence based on transfinite induction - something you aren't entitled to do by default - and you're fine. The axiom solves a non-problem.

All of the confusion I think is there in that history predates Hewitt. The muddle starts with the "controversy" over the concept and fretting about "unbounded non-determinism". People encountered questions like "Can you prove that this non-deterministic program terminates? and can return any non-negative integer?" and got confused about how those common sense questions translate well into the formal logic.

too ugly

I think that my statements have been pretty solid and don't merit the condescension but there's a bigger problem. The conversation is starting to take on a flavor of other people saying to us "Let's you and him fight." Bah to that. Catch you down the road when its interesting again.

Well said

Apologies if my comment came across that way. It wasn't intentional. I'm more interested in hearing a solid answer to some of the questions posed as they seem to get to the core of what is interesting about Actors.

Can you explain

If I understand the example referenced above it is saying that an actor can produce an unbounded sequence because its stop message can be delayed for an unbounded time.

Can you explain how to distinguish between an unbounded delayed message and a message which is never received, which is contrary to one of your axioms stated elsewhere?

Fairness Axiom

Carl Hewitt is attempting to make a distinction based upon the "always terminating" property.

Actors model has a fairness axiom that says every message must eventually be delivered. This means that no message can be indefinitely delayed by other messages. The axiom isn't trivial - e.g. you can use the axiom to reject a FILO implementation of message passing.

The problem, as Thomas has noted, is that this axiom is not constructive. There is no way to look at the state of the model and determine an upper bound on the delay for the delivery of a particular message. Every possible state in the actors model has a corresponding state in an NDTM, and vice versa. The only distinction to be made is in the axioms.

IMO, all that 'unbounded integer' really means is that the model doesn't tell us which integer is returned (except that it is non-negative). Properties left artfully undefined in a model serve as a degree of freedom for the implementation, something that can be further specialized (e.g. a temporal or vat-based actors model), not as a requirement that the implementation or language be poorly defined. Even an actors model language or implementation that deterministically chooses an integer could still obey the axioms of the actors model.

Thanks

I think my misunderstanding was to take unbounded to mean possibly infinite rather than meaning finite but undefined,

Isn't an actor for Hewitt

Isn't an actor for Hewitt something real which cannot be formalized completely? In a sense a TM can only fake real communication. Concepts like "space" and "distribution" remain metaphorical. There is only a tape and read/write operations. But even the tape and the read/write operations are not going to happen and exist but are denoted. The meaning must be present elsewhere - in the mind or the physical world and this is not going to change even when presenting a semantic theory which also means nothing on its own: a mathematical object doesn't exist either.

Isn't an actor for Hewitt

Isn't an actor for Hewitt something real which cannot be formalized completely?

Is there evidence that "reality" is not formalizable?

John Searle would probably

John Searle would probably make the common sense remark, that milk can be formalized but you can't drink the formal model. So there is a gap which can't be overcome by putting more elaboration into the model e.g. by modelling yourself as a milk drinking person. This still happens in the model i.e. it doesn't happen, it's only a fantasy.

Now you can put actors and computations on the side of the real and reject the idea that they can be implemented using a TM because of asynchrony. "Implementation" serves the same purpose as "drinking" in the allegory above. It happens outside of the fantasy - unlike the actors model.

I meant the question only in

I meant the question only in the sense that reality could itself be a formal system whose semantics we are discovering via physics. So "real" and "formal" are not necessarily distinct.

I think Tom's arguments are spot-on: there is an implicit oracle introduced by appeals to "reality". We don't know anything concrete about the boundedness of reality, and reality need not fulfill that requirement, so introducing an axiom based on such assumptions isn't justified.

foundations

"Actors cannot be implemented in the lambda calculus."

You can not formalize that claim in a way that will be widely convincing.

Scientific Progress

Allegedly, Max Planck said that "Science advances one funeral at a time."

Kuhn, too.

Kuhn said something similar. Sorry.

Not clear that Kuhn said something similar.

Where did Kuhn say something similar?

Kuhn

I consider his description of paradigm shifts (and the resistance they encounter) as similar. I think Kuhn did, too. I don't have a copy of The Structure of Scientific Revolutions at hand but the search engines tell me that he quotes Plank about the death of old scientists on page 151. I actually was lucky enough to see him lecture once, though when I was too young to fully appreciate who I was going to see (the person who dragged me handed SSR to me a few hours before the talk). I think I remember him talking there about how, um, attrition let's call it (combined with fewer and fewer new adherents) is characteristic of some paradigm shifts and is a symptom of the difficulty with which transitions between incommensurate ways of looking at things is made.

Kuhn was great!

When I was at MIT, I attended his lectures. He was fantastic!

I've a feeling that

I envy you that opportunity.

I've a feeling The Structure of Scientific Revolutions may have said something about older members of a scientific community who never accept a new paradigm, but I may just be imagining that because it seems so plausible that it would have. I can pretty much guarantee there weren't any quotable quotes about it, though; much though I treasure my copy of that book (I can see it about four feet away from me on the shelf as I write this), I've found it wall-to-wall dry, verbose writing — deeply insightful doesn't guarantee sound bites.

The population-replacement effect can't be reliably applied to specific situations, though. What makes science different from art, religion, and such is that an idea flying in the face of conventional wisdom isn't likely to supplant the conventional wisdom unless it can best it on objective grounds.

That's exactly what it says

By the way, you can speed-read SSR very fast simply by reading the first and last sentences of every paragraph. Kuhn's writing style fits that speed-reading trick beautifully. That will answer your question in much more detail than I care to here.

Here you go, top of page 151

Darwin, in a particularly perceptive passage at the end of his Origin of Species, wrote: “Although I am fully convinced of the truth of the views given in this volume …, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. . . . [B]ut I look with confidence to the future,—to young and rising naturalists, who will be able to view both sides of the question with impartiality.”7 And Max Planck, surveying his own career in his Scientific Autobiography, sadly remarked that “a new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”8

7 Charles Darwin, On the Origin of Species … (authorized edition from 6th
English ed.; New York, 1889), II, 295-96.
8 Max Planck, Scientific Autobiography and Other Papers, trans. F. Gaynor (New
York, 1949), pp. 33-34.

<3 EBooks

Nice quote!

Good work digging up the quote! Do you also happen to have the quotation of what Max Planck actually wrote in addition if anything?

Thanks!
Carl

Do you really want the full quote?

He was talking about Boltzmann being a sarcastic jerk to him.

E-mail me if you want the full thing.

Thoughts on the actual topic

Having read Gilad's blog post, I don't see much to comment on, but I will comment on whatever substance I can find between the initial post and the ensuing comment section and tweets to James Iry:

Gilad says compositionality is key. He says he explicitly did not want monadic behavior for his parser combinators. Others point out that monads do provide compositionality. Greg Meredith and one other commenter provide the only real signal-to-noise in the rest of the discussion, noting that whatever framework you develop, it will always have some shortcomings on how to compose the program. He gives the example of Fork-Join pattern, and says compositionality is key, to which Gilad agrees.

Personally, I don't understand what we should be discussing.

If we're discussing hiding monads with syntactic abstraction, then there are many angles to view this from.

First, why shouldn't modern VMs like the JVM and the CLR support higher-kinded polymorphism natively?

Second, syntactic abstraction makes reverse engineering the code harder. How hard, I don't have enough experience to say. But one of the extremely nice things about CLR byte code is how easy it is to read and think what that byte code looks like in C#, and decompilers like Reflector can even do this for you automatically. That opens up new opportunities for debugging and source code visualization.

Third, using direct overloads to support ad-hoc polymorphism in place of higher-kinded polymorphism, as is done in .NET for scenarios where something like typeclasses might be used instead, might be studiable. We can actually take real world examples of how .NET has had to structure its API differently due to the lack of that feature, and study how research from Haskell gets ported to .NET. The impetus for this idea is studies I've seen done by Patrick Lam, where he searches code for unusual API usages based on statistical distribution of what's going on, and tries to look at the outliers and see why the standard mechanism didn't work right. We could perhaps apply that same idea here.

Fourth, I don't understand why, with Newspeak's pluggable types, a type inference system based on monads is not possible. It doesn't exist, and it is not Gilad's personal preference. But I don't see why it can't be done.

Monads are much nicer with inference: "result type overloading"

Third, using direct overloads to support ad-hoc polymorphism in place of higher-kinded polymorphism, as is done in .NET for scenarios where something like typeclasses might be used instead, might be studiable. We can actually take real world examples of how .NET has had to structure its API differently due to the lack of that feature...

Fourth, I don't understand why, with Newspeak's pluggable types, a type inference system based on monads is not possible. It doesn't exist, and it is not Gilad's personal preference. But I don't see why it can't be done.

One issue with monads is that they tend to require a strong form of inference, which is different from the runtime ad-hoc polymorphism provided by most not-so-typed languages. Indeed, the `return` of a monad `M` has type `∀α. α → M α`. When you write `return 1`, there is no runtime value that carry the monadic type. If you haven't inferred it statically from the context -- eg. `runST foo (return 1)` -- you cannot guess which dictionary to associate to the return value.

For that reason, when you use monads in a non-inferred language, you need some kind of annotation: in a typed language you may need to explicit which monad is considered, and in an untyped language you will have to pass the monad dictionary directly.

This article on monads in Common Lisp explains this very issue. Not that in Lisps, monads can be represented as macros rather than functions using a runtime genericity mecanism, which basically work around the whole issue.

Haskell use an elaboration phase after type inference, where the type classes are translated away into dictionary-passing style. For this reason, the runtime semantics of a Haskell program directly depends on its types. Pluggable types advocate a contradictory view, where types are only used for static analysis and can be erased without changing the runtime semantics. So those two features can be seen as contradictory.



PS : type erasure can be seen as desirable even in richly typed languages. It is usual when developping new type systems features to prove that they preserve erasure semantics of the language -- eg., when adding System F style explicit type polymorphism, you check that they don't change evaluation behaviour, which would do weird things with side effects.
However, things are not black and white : you can isolate some specific forms of non-erasable behavior and guarantee that other parts of your language have an erasure semantics. For example, I have the intuition that Haskell, *after* the type class resolution pass, respects the erasure property.

For a different point of view on a related matter, see Conor McBride's "Why do programming languages use type systems?" answer. He argues that we should give up erasure semantics and use new way to enrich the runtime semantics using type checking or inference.

Edit: I now remember that I reached this page from your blog. Sigh.



PPS : thinking of it, it may be possible to solve the problem I mentioned with monad's `return`. The idea would be, basically, that it's ok not to know what the intended monad is, and pass a value as member an "unknown monad" until the context forces a specific monad. More specifically, the unknown monad should behave as a free monad, accumulating a tree of non-evaluated monadic operations -- of course this would have an efficiency cost. I have not seen that solution deployed anywhere, but have not looked deep into monadic frameworks in less-typed languages.