## Confused Deputies in Programming Languages

There was an insightful discussion recently on the capabilities mailing list about confused deputies, and how various language constructs can inadvertently introduce confused deputies. For instance, Alan Karp devised a confused deputy innocently introduced by Java's package scoping. I then reproduced this confused deputy using OCaml's modules.

In any system with encapsulation of some sort, there is an inherent disconnect between a caller's permissions to an object, and the callee's permission to that same object. Whenever the callee has greater authority over an object it's given than a caller, and this authority is granted implicitly (by the type system, for instance), a Confused Deputy can arise. This authority augmentation is called "rights amplification" in the capability security literature.

For example, the abstract OCaml type Confused.O.t is automatically amplified to an out_channel upon calling a function in module Confused.O. This usage is safe however. The only problems arise with binary functions, where the second parameter is also implicitly amplified. One solution proposed on the list was to forbid such implicit rights amplification for all parameters except the first, thus forcing all amplifications to be explicit (cap-talk is generally object-centric, thus the emphasis on the first parameter only).

I wasn't in favour of placing this burden on all binary functions, so I proposed instead to forbid "package-scoped" functions. By this, I mean that only publicly accessible functions are visible even for modules within the same "package" (or scope/nesting/etc.). The OCaml vulnerability is only introduced because module Confused.B can access the internal Confused.O.write function, which is not publicly accessible outside of the Confused "namespace".

If this lax scoping were forbidden, the compiler writer would have to either

1. expose the Confused.O.write function publicly, in which case its obvious that writing to the billing file is possible outside of the compiler's module,
2. split the functionality for writing the log and the billing file into separate modules (the best solution),
3. parameterize Confused.B with a closure to write into a Confused.O instance, effectively making it a separately wielded capability.

In all cases, the authority being wielded is more explicit, and thus accidental vulnerabilities are minimized. My proposal may complicate abstractions provided as a set of co-operating modules however, so I have three questions:

1. How common is it to have two co-operating modules use each other's private functions?
2. Has there been any research on this or similar topics, or does this problem resemble any others that have been studied?
3. Can anyone think of any other possible solutions to this problem, preferably with a minimal burden on the user?

On a final note, there was also some discussion about creating confused deputies just using shared mutable state, so there is yet more evidence of the safety of purely functional programming.

## Comment viewing options

### My wife and I are co-operating modules...

...and how often we use each other's private functions is nobody's business.

### Illogical

Obviously, "how often [you] use each other's private functions" is at least your own business, and therefore not "nobody's business." Furthermore, it is certainly the business of your respective doctor(s), and indeed public health officials, what sorts of things you might do in private. And it's in the public's interest that medical and public health professionals behave responsibly. Thus the subject is in everyone's interest, albeit indirectly.

And if I'm not mistaken, the above is an illustration of Noetherian induction. String rewriting systems: Catch the fever!

I expected some type-wizardry to be able to prevent this innocent security vulnerability. :-)

### There is...

...I just haven't had time to dig into it, and the more general question of object-capability security enforced by the type system really needs more thought. I do believe that some combination of information-flow analysis and/or proof-carrying code can solve all of the problems solved by object-capability security—at least in the presence of a sufficiently powerful type system, probably including dependent types so as to capture the object-capability security community's key insight that it is the dynamic graph of references that is important rather than the static layout of, e.g. an access-control list.

Along with making concurrency safe and commonplace, I think this is the great challenge facing the design of the Next Mainstream Programming Language™. :-)

### Someone here recently

Someone here recently pointed me to this blog post on using Arrows to model information flow in capability systems. It looks quite promising.

As for my post in particular, I don't think it requires any sophisticated typing, but I was hoping someone would comment on whether my restriction to module scoping was too onerous, or perhaps point me to some more well-founded technique that can tackle the Confused Deputy problem, even if only as a side-effect.

I do believe that some combination of information-flow analysis and/or proof-carrying code can solve all of the problems solved by object-capability security

Actually, I've come to believe that capability security is subsumed by referential transparency, which doesn't require anything too sophisticated (Hindley/Milner + Monads, at least). If you maintain referential transparency, you gain all the benefits of capabilities that I can see, and then some. A key open problem is properly structuring the FFI to follow capability rules, and not introduce any ambient authorities.

### Capability security and referential transparency

Actually, I've come to believe that capability security is subsumed by referential transparency

I don't understand how this can be. Capability security is possible in a language with state. Referential transparency is not. Can you explain?

State is an essential property of programming languages, for reasons of modularity (as has been discussed before on LtU). Referential transparency is ok for parts of programs but it is much too restrictive to assume it for all programs.

naasking: Actually, I've come to believe that capability security is subsumed by referential transparency

PVR: I don't understand how this can be. Capability security is possible in a language with state. Referential transparency is not.

I don't see the contradiction here. He is claiming that RT is a sufficient condition, not a necessary one. And I tend to agree.

### RT is sufficient for capability security?

Ok, I understand the point. It may be so. But I am not convinced that RT and CS are talking about the same things. RT does not respect abstraction boundaries: it says that equals can be replaced by equals. The first "equals" might be something outside an abstraction boundary and the second "equals" might be something inside that abstraction boundary! CS above all requires to respect abstraction boundaries.

### I think the idea is

I think the idea is that the whole point about something being a capability is that it has some external effect. Hence any use of a capability is inherently impure, and RT code trivially CS.

Monads do not change this either. Computing a monadic action is pure by itself. Only executing it is potentially impure. But at that point you can choose the execution method, i.e. the interpretation of the monadic primitives - in other words, you choose which capabilities you provide. The RT code itself can never do that.

### Capability = designation + authority

the whole point about something being a capability is that it has some external effect

Whether the effect is external or not is orthogonal to capabilities. A capability bundles designation and authority. For example, it can designate an object and give the authority to invoke a particular method of the object. This is also orthogonal to whether the language has state or not. See section 6.4 in CTM for more explanation.

any use of a capability is inherently impure

This is false. See section 6.4 of CTM, e.g., for an example of a purely declarative object. References to such things are capabilities.

### Purity = ???? ; Profit!

I think you're using different values of "purity" - but one of them has some significant degenerate cases, as you highlight.

It's a common haskeller idiom to describe monadic code as being impure even if the monad in question has a pure implementation - the EDSL described by the monad being an impure language. But not all monads describe impure languages! The identity monad belongs in everyone's list of counterexamples to generalisations about monads, for example.

What monads (and EDSLs in general) do give you is 'delimited ambient authority'.

### This is the debate I was

This is the debate I was hoping for! Lot's of interesting points. :-)

A capability bundles designation and authority. For example, it can designate an object and give the authority to invoke a particular method of the object.

Memory safety is all that is required to bundle designation and authority. The problem with most languages, is that they then add additional machinery above and beyond memory safety which violate both RT and CS principles. As for the connections between RT and CS, RT provides stronger properties than CS, while losing nothing except convenience.

This is false. See section 6.4 of CTM, e.g., for an example of a purely declarative object. References to such things are capabilities.

Yes, but that's not the interesting use-case.

If an object is stateless, then identity is meaningless (like Joe-E's Selfless objects). Purely functional objects are in this class.

If an object is not effect-free, a RT language requires some way to tame this impurity, whereas CS requires only that the effect is controlled via an explicit, unforgeable reference. Regardless of the taming method used, preserving RT provides stronger properties than simply controlling the impurity via memory safety. Only in this class of objects is identity meaningful.

For almost all practical purposes, state can be tamed through data abstraction. A stateful system can be designed so that a well-defined property, called an invariant, is always true when viewed from the outside.

It's true that abstraction can help one reason about effects, but it's also true that preserving RT enforces such invariants and makes that reasoning even easier. For instance, seeing "IO a" in a function or module signature is an immediate flag that invoking this object will have some potentially global side-effect. Refining the types of monadic effects ("IO a" => "DelDir a", or "Ref a", or "SendPacket a" etc.) makes this even more fine-grained and powerful.

Preserving CS is the bare minimum though, and SML is almost there if it weren't for all the ambient authorities exposed in the Standard Basis Library.

RT does not respect abstraction boundaries: it says that equals can be replaced by equals.

I don't see the problem. From the object-centric view, a client can only interact with an object by sending messages, and if B and C both respond to the same messages, it doesn't matter if you've substituted C for B. This perfectly describes such "equals", and the situation is no different with RT.

CS is ultimately about controlling effects, and RT is all about that as well. RT just does it "better". :-)

### RT is not an unqualified good

RT provides stronger properties than CS, while losing nothing except convenience

RT is not an unqualified good. It is a trade-off: one gains some properties (equals can be replaced by equals) in exchange for losing other properties (in particular, modularity). Whether or not the trade-off is a good one depends on the case at hand. There are many cases when it is not a good idea to have RT.

RT languages cannot tame true mutable state. They always lose the modularity of true mutable state: the ability to change part of a program without changing the rest. Therefore programming languages should also allow one to program without RT, if the programmer so decides.

seeing "IO a" in a function or module signature is an immediate flag that invoking this object will have some potentially global side-effect

There are two problems with this statement. (1) Global state is never needed; local state is sufficient. (2) Frequently one needs to change the state of an entity without that being visible in the signature (modularity again!). Monads don't allow that. This problem is always finessed in an RT language by stopping the running program, changing the definitions, and recompiling the program. But the system in which the RT language is implemented always has true mutable state, and for good reason.

### RT is not an unqualified

RT languages cannot tame true mutable state. They always lose the modularity of true mutable state: the ability to change part of a program without changing the rest.

I don't see how this can be true. Monads are only one way to tame effects. I'm currently leaning towards uniqueness typing myself, though I'm investigating other substructural type systems as well. Monads require refactoring to add effects (a non-local change), but substructural types do not require non-local changes, unless you're propagating effects outside of the function's scope.

(1) Global state is never needed; local state is sufficient

It depends what you mean by "global". From the program's perspective, global means "anything beyond my control", ie. the outside world. Effects made in the outside world are clearly non-local. For instance, the file system, sent network packets, etc.

Frequently one needs to change the state of an entity without that being visible in the signature (modularity again!).

I don't find that convincing. The caller is making himself vulnerable by invoking the callee, so he would want to know whether the callee will actually terminate, for instance (termination is an effect). So there are many good use-cases for exposing callee properties in its type. I consider E's auditors a form of typing for instance.

So I don't think this is a case of modularity at all. If one ignores callee properties, one sacrifices safety. It's a viable trade-off, but that's not the same trade-off you're implying.

But the system in which the RT language is implemented always has true mutable state, and for good reason.

Or we're all running in a monad. :-)

[Edit: on RT being an unqualified good, I think anything that helps one to reason about a system is an unqualified good. Thus, RT is an unqualified good in my view. The means by which the user is exposed to RT is definitely not an *unqualified* good however.]

### State and modularity (again)

I don't see how this can be true.

The discussion State and modularity has an example that can't be done without true mutable state.

Or we're all running in a monad. :-)

Check out "The fallacy of the preprocessor" on page 318 of CTM.

### true == implicit?

That example shows you can only transparently add externally visible state to a module in a language where access to state is implicit. That's only an argument for implicit state if it's a good thing that the change is completely transparent.

That sort of change might be equally convenient but not transparent with something like effect polymorphism.

In the context of security making sure that sort of change is not transparent might help audit for information security.

### SML is almost there

SML is almost there if it weren't for all the ambient authorities exposed in the Standard Basis Library

Note that SML allows one to rebind any top-level identifier, making the original binding inaccessible to the rest of the program. So, in a traditional SML system, one can relatively easily replace the entire basis library. IOW, one can easily write an SML' dialect (providing a different basis library) on top an existing SML implementation.

For example, I used this technique in a smlbot (for IRC) to make it impossible to perform IO. Basically, a prefix code, that rebinds all IO capable modules to a dummy (empty) module, is evaluated before the user program.

Traditional, interactive SML systems (e.g. Poly/ML) provide a means to save the "world" and restart from a given "world", thus making it possible to create a dialect without having to evaluate the new basis library dynamically. Some dialects also provide library definition languages beyond the basic module language. SML/NJ's CM and MLton/MLKit's MLBasis system are examples of this. I know of two examples of using the MLBasis system to create a [replacement for/extensions of] the basis library. They can be found from MLton's library repository.

### I knew you could rebind

I knew you could rebind definitions, but I thought these were limited to "shadow" bindings, ie. a lexically shadowed binding, so it only applies to code under your control. If you're loading third-party SML code, doesn't it have access to the original top-level bindings?

What would be the new semantics of a rebound open_file if you wanted to forbid I/O?

1. But you shouldn't allow one to load SML code. You should only allow one to load SML' code. I really don't see how you could load (precompiled) third-party code without the risk of it having access to some ambient authority and without imposing a priori restrictions on the third party code. IOW, I think that there is a minor contradiction in your requirements. If the SML' language has an FFI, it must be designed so that a library cannot silently obtain access to foreign functions.

2. SML, as such, doesn't provide a means to load precompiled (SML or otherwise) code (SML, the Definition, does not provide for separate compilation). Interactive SML systems often provide a (non-standard) use function or directive to evaluate SML code from a specified file. The evaluation is (usually) done with respect to the current top-level environment. (It is like you would have typed the code at the REPL.) So, in such a system, if you first rebind top-level bindings and then use some code, it cannot access the original top-level bindings.

What would be the new semantics of a rebound open_file if you wanted to forbid I/O?

None (you can select which values to leave visible in a module) or whatever you want (rebind it to whatever value you want).

For example, the SML basis provides the function print at the top-level. It has the spec:

val print : string -> unit

If you don't want the rest of program to have access to it, you can evaluate:

val print : unit = ()

In a more sophisticated system, like the MLBasis system, you can completely hide top-level bindings. IOW, the new basis doesn't have to have a top-level binding of print.

### Restricting information access

Can't you use CS to restrict access to information in a RT setting?

### Capability security is

Capability security is possible in a language with state. Referential transparency is not.

That's not true.. all Haskell programs are RT, yet Haskell has true mutable state. The interface to mutable state in Haskell is certainly not perfect - I don't think state should be bundled in the IO monad - but I think a better RT interface could be designed.

On whether referential transparency subsumes capability security: it's clearly not strictly true, because Haskell has RT but not CS. The IO API would need to be redesigned as naasking says.

### Monads can not implement true mutable state

Monadic state does not have the modularity properties of true mutable state. This has already been discussed on LtU. See also section 4.8 of CTM.

BTW, there is nothing inherently wrong with true mutable state. It is an essential concept, and poses no particular problem when handled with care.

### Overstatement?

[Mutable state] poses no particular problem when handled with care.

Isn't that quite a bit of an overstatement? Just look at the semantics, meta-theory, and reasoning techniques for languages with and without mutable state. I dare say you won't find any other PL feature that can complicate matters as profoundly. Taming it still is a very hard research problem. For example, a denotational semantics for state with first-class functions is so complicated that it's hardly useful.

### Denotational semantics?

a denotational semantics for state with first-class functions is so complicated that it's hardly useful

Idem for a denotational semantics for concurrency. Doesn't that rather say something about denotational semantics?

### No overstatement

For almost all practical purposes, state can be tamed through data abstraction. A stateful system can be designed so that a well-defined property, called an invariant, is always true when viewed from the outside. This makes reasoning about the system independent of reasoning about its environment. This gives us back one of the properties that makes declarative programming so attractive. (Paraphrase of part of Section 6.2.1 of CTM.)

In fact, the real problem is not state, but global state. Local state, which is encapsulated inside a data abstraction, is all we really need and it has no particular problem.

### What poses problem when handled with care?

The problem with state is how can we be careful with it when it's invisible. In a language that makes effects explicit (e.g. monads, uniqueness typing, effect tracking) we know exactly what will happen if we can a function, but when state is invisible, there're no guarantees and reasoning about code becomes harder. Ditto for other effects like continuations, exceptions, space consumption. For example, the fact that (when effects are invisible) f x . g y is different than (\z -> (\w -> f x w) (g y z)) throws away many valuable properties that equational reasoning assumes. Such properties of languages with explicit state are valuable for writing correct code. If we start mixing implicit effects, suddently it's much harder to know what will happen when both occur. For example in any stateful language with effects there's no way to guarantee that when exceptions are thrown certain stateful operations are undone, it's very common in Java to have objects with broken invariants after exceptions happen, in Haskell we can use STM and be safe knowing that there are semantic guarantees related to correctness.

Calling state an essential concept just begs to know what essential is. Foundationally state isn't required, otherwise the lambda calculus wouldn't be turing complete. Your work points that it's a desirable fundamental feature for language composition, but it doesn't make it essential. The fact that it was already discussed on LtU doesn't mean that a correct understanding of this was reached, far from it there are many active research areas that consider state explicitly.

### Warning: may contain heresy

Calling state an essential concept just begs to know what essential is. Foundationally state isn't required, otherwise the lambda calculus wouldn't be turing complete.

I'm going to have to side with Peter, here and I want to support that view with a two-pronged argument: a practical one and a theoretical one.

From the practical point of view, there are problems that one wants to solve with software that are simply, irriducibly, inherently stateful. For example, any garden-variety online commercial application is inherently stateful in the sense that it needs to track non-monotonic changes in real-time information.

Now you can use the technique of RT to reduce the complexity of this problem down to the bare nub of of its essential statefulness, and this is a good idea from a complexity-reduction point of view, but you can't 100% eliminate statefulness from a problem like this that has inherent statefulness.

To move to the theoretical front, I'm going to make the more controversial point: I think that the LC does contain inherent statefulness, and that, by definition, Turing-completeness is a kind of statefulness.

Potential non-termination is inherently stateful in the sense that, on any given reduction step n of some arbitrary expression E, I can't in fundamental principle tell whether I'm going to ultimately terminate in some finite number of steps, or not terminate.

Now this might not sound like state, but hopefully everyone can see how it destroys RT: if the expression terminates, it will equal to, and hence substitutable for, some family of expressions representing its value (i.e. normal form). If it doesn't, then it can only be substituted for another non-terminating expression (if you are allowing that identification).

The catch is that you may never know which answer is true, so your whole logical system is stateful in the sense that it may be one in which E has a value already at some step n, one where E has a value at some step greater than n, or it may be system where E never gets a value.

I think this is the Achilles heel of denotational semantics or other mathematical models of computation, such as CT, because in math you always assume you can zip to the end of infinity and see what "really happens", whereas, in computation, you have to accept as a foundational idea of the field that you can't do that for arbitrary cases.

### I'm going to have to side

I'm going to have to side with Peter

I'm assuming you mean, you're siding with him on the necessity of state, since that seems to be the thrust of your post, and not on the "RT does not subsume capabilities" issue.

I think that the LC does contain inherent statefulness, and that, by definition, Turing-completeness is a kind of statefulness.

I think I agree. Non-termination is an effect, just like mutation, so there is a strong connection there.

### Go Team Peter!

I'm assuming you mean, you're siding with him on the necessity of state, since that seems to be the thrust of your post, and not on the "RT does not subsume capabilities" issue.

You're right as far as that response goes (I was responding to Daniel's post), but I also agree with him that RT and capabilities are orthogonal concepts, even if RT may moot the latter in some circumstances. ;-)

[small edit]

### I also agree with him that

I also agree with him that RT and capabilities are orthogonal concepts, even if RT may moot the latter in some circumstances. ;-)

What I'm really after, is what those remaining circumstances actually are. What properties can capabilities provide that RT does not?

### E.g.

What properties can capabilities provide that RT does not?

Imaging the difference in an unreliable distributed environment. RT can't be guaranteed, since a remote system might be down, or not get a message, or have been restarted, flushing the cache, etc. In this scenario, capabilities are necessarily stateful.

### Imaging the difference in an

Imaging the difference in an unreliable distributed environment. RT can't be guaranteed, since a remote system might be down, or not get a message, or have been restarted, flushing the cache, etc. In this scenario, capabilities are necessarily stateful.

Statefulness does not preclude referential transparency! Let's assume an SML with uniqueness typing:

type 'a channel{U}   (* {U} qualifies the channel as unique, so we can only read from it once *)
type 'a message = Data of 'a | Restarted | Failure | ...

(* read returns the pending message and a new unique channel for all subsequent operations *)
val read: 'a channel{U} -> 'a message * 'a channel{U}

let ch : string channel = ... in
Data x ->    (* compute with x *)
| Restarted -> (* remote service restarted *)
| Failure ->   (* some sort of channel failure *)

The above is referentially transparent, and also capability secure, and it's properties are stronger than CS alone.

### Transparent?

The above is referentially transparent, and also capability secure, and it's properties are stronger than CS alone.

I don't know; to my eye this set up is RT in the same way that sticking your fingers in your ears and saying "Nya, nya, I can't hear you" prevents you from hearing unpleasant truths. ;-)

Moreover, though this is compatible with capabilities, you don't actually show the capability implementation, and I don't see how you would get the capability behaviour for free in this instance just because this is "RT". This supports the notion that the RT part and the capability part are orthogonal (note the claim was they were orthogonal not mutually inconsistent.)

### Moreover, though this is

Moreover, though this is compatible with capabilities, you don't actually show the capability implementation

I think this a very odd statement. What is a "capability implementation"? Capability security properties are satisfied by simple memory safety, so what is there to show? You can't have RT without memory safety, so by extension, RT includes capability security (for pure language objects).

Capabilities have a few other requirements, namely taming ambient authorities (impure objects external to the language/system), but these requirements involve controlling effects, which is already satisfied by preserving RT. Thus, RT subsumes capabilities. :-)

### Que?

What is a "capability implementation"? Capability security properties are satisfied by simple memory safety, so what is there to show?

We seem to have lost each other somewhere.

Remember I was assuming a distributed system (which to me implies participants without shared memory). It is unclear to me how mere memory safety will solve the problem in such an environment.

Or maybe we are two people separated by the same language. ;-)

### Remember I was assuming a

Remember I was assuming a distributed system (which to me implies participants without shared memory). It is unclear to me how mere memory safety will solve the problem in such an environment.

Assuming an untrusted open network, which is a maximally hostile environment, you simulate memory safety by sparseness (unguessable identifiers). That's the best you can do in such situations, RT or not.

### Nobody expects the RT inquisition!

From the practical point of view, there are problems that one wants to solve with software that are simply, irriducibly, inherently stateful. For example, any garden-variety online commercial application is inherently stateful in the sense that it needs to track non-monotonic changes in real-time information. Now you can use the technique of RT to reduce the complexity of this problem down to the bare nub of of its essential statefulness, and this is a good idea from a complexity-reduction point of view, but you can't 100% eliminate statefulness from a problem like this that has inherent statefulness.

I was never arguing against state, the issue I see is with invisible state. There's a fundamental difference between causing and not causing an effect, for example mutating a piece of state, this can't be denied, so the question is how can the programmer know that a piece of code is safe to share or isn't? Without language support there's no way to be sure, we end up doing unnecessary work (e.g. defensive copies of mutable data structures) to avoid possible problems, overspecifying order of evaluation, introducing bugs. Invisible state means that every function call may cause state effects, visible state just show which ones are state dependent. There's a huge difference between f :: a -> b, f :: a -> IO b, f :: IO (a -> b) and f :: IO (a -> IO b), invisible effects blur those and treat statically as the same whereas semantically they're different.

To move to the theoretical front, I'm going to make the more controversial point: I think that the LC does contain inherent statefulness, and that, by definition, Turing-completeness is a kind of statefulness.

Potential non-termination is inherently stateful in the sense that, on any given reduction step n of some arbitrary expression E, I can't in fundamental principle tell whether I'm going to ultimately terminate in some finite number of steps, or not terminate.

You're mixing state with non-termination, which are diffent kind of effects (just to nitpick), so let's use a correct terminology and just talk about effects. As you can see I talked about effects in general, so my argument is about invisible effects. I as a programmer care a great deal if a piece of code terminates or not, but the syntatic interpretation of most languages ignore this property, so I have to test a lot (hoping to cover all the corner cases) and prove it correct (assuming that I don't commit mistakes). If the languaged tracked the distinction my job would be much easier. I'm in the camp of "Turing-complete is (for most apps) overrated, just give me data and codata!".

In theory code with effects is different from code without effects and in practice I do care about it. My daily work is in Java (doing "enterprisey" and web apps), so I have to spend time figuring out if I have to make a defensive copy of a collection before passing it as a parameter to something (because we never know if the code will change it or not), having to do code reviews to see if the code I wrote is exception safe, checking if my recursions and loop are terminating or productive. Effects are necessary, but not everywhere, and if they are invisible there's no way to tell where the effects are.

### Copying collections

My daily work is in Java (doing "enterprisey" and web apps), so I have to spend time figuring out if I have to make a defensive copy of a collection before passing it as a parameter to something

I can sympathize with that. However, I would argue that the problem there is with Java's pervasive default mutability and lack of comprehensive support for immutable data. A sane imperative language like SML offers comprehensive support for immutable data. In SML, unless you really want to allow others to mutate your data, you just use immutable data. I don't think I've ever had to just copy mutable data in SML for safety like I've done in Java and C++, for example.

having to do code reviews to see if the code I wrote is exception safe, checking if my recursions and loop are terminating or productive

Note that concerns like these also apply to code written in a pure language. For example, it is easy to write code in the IO monad in Haskell that isn't exception safe (e.g. leaks handles or leaves a mutable data structure in an invalid state). Purity also doesn't automatically make recursion terminating or productive.

Actually, I just recently read a comment (not for the first time) from a paper (tech report Testing Properties of Generic Functions by Jansson and Jeuring) that they had to catch exceptions from "pure" (non-monadic) code in (Generic) Haskell. But, of course, Haskell does have effects beyond non-termination in "pure" code, so that doesn't contradict your point.

(minor edits)

### We are the knights who say CTM!

I was never arguing against state, the issue I see is with invisible state.

Then perhaps we are all in agreement. An essential benefit of viewing things through the CTM hierarchy is the insight that you can have a system that, taken as a whole, is stateful (or if you prefer effectful), but that, by restricting your use of various kernel language abilities, you can help make the inherently effectful parts more visible by allowing effectful abilities only where they are needed.

You're mixing state with non-termination, which are diffent kind of effects (just to nitpick), so let's use a correct terminology and just talk about effects.

Since one good nitpick deserves another ;-), though I will grant that what you propose is a consistent terminology, and fairly standard in certain contexts, at the foundational level I was discussing, I don't think this distinction makes a difference.

This may be the source of our seeming "disagreement": we are probably focusing on different levels of abstraction.

### *witty subject goes here*

Then perhaps we are all in agreement. An essential benefit of viewing things through the CTM hierarchy is the insight that you can have a system that, taken as a whole, is stateful (or if you prefer effectful), but that, by restricting your use of various kernel language abilities, you can help make the inherently effectful parts more visible by allowing effectful abilities only where they are needed.

I don't think we all are in agreement. From previous discussions with Peter van Roy (unfortunately my goggle-fu revealed nothing, shame on me) his usual exemplo is a module that provides a function (say fibonacci) and then updates the module to make it memoized without changing the callers. So it's implicit state AFAICS, which I classify as evil and unclean. No issue with CTM hierarchy thingy, it was it that made me focus on calculi instead of full languages.

[On edit] I found a piece relevant to this discussion.

### Depends on the problem

then updates the module to make it memoized without changing the callers. So it's implicit state AFAICS, which I classify as evil and unclean.

I don't recall that example, but I can't really know how to evaluate the relative evilness of the solution, without knowing the problem it solves.

Let's say it is intended to be a unique key generator for some distributed app: would that be an okay use of the "evil"? (I'm not sure how else you would solve this problem without shared state...)

If you are arguing for language mechanisms that help to enforce and make visible such statefulness, I really don't think you'll get an argument about that, especially if you can come up with a mechanism to do it that everyone likes. ;-)

I really do think there is more agreement than is apparent: one side says "wisdom is using as little state as possible", the other says "state is okay, but use it wisely (i.e. as little as possible, but no less)".

To my mind the difference is merely one of emphasis and perspective. How little state you can get away with depends on the problem you are trying to solve...

### doubleplusungood

Let's say it is intended to be a unique key generator for some distributed app: would that be an okay use of the "evil"? (I'm not sure how else you would solve this problem without shared state...)

Which role would such generator play? IME hiding the network is wrong, so every remote operation would be already tagged, so code like getRemoteId :: Address -> IO Id would already assume some sort of IO happening, it wouldn't matter if the remote app used state or not (as things like network failure, delays, etc., are necessary to handle). The id generation code can be as simple as a makeId :: Id -> Id function and have the ids explicitly threaded.

I really do think there is more agreement than is apparent: one side says "wisdom is using as little state as possible", the other says "state is okay, but use it wisely (i.e. as little as possible, but no less)".

I updated my post (the one you replied to) with a link to his argument. PVR's posts imply that the discussion ended and state is known to be necessary, while I disagree with that assertion and believe that the discussion is still open.

### Love Big Brother

Which role would such generator play? IME hiding the network is wrong, so every remote operation would be already tagged

I'm getting the feeling that you (and maybe naasking too) are thinking about all these issues from a language-specific viewpoint. You seem to be assuming language features that, though I won't disagree are nice to have, are not the only way to do things, and certainly don't describe the boundaries of the solution space for the given problems in any fundamental way.

Solving some problems does require state, and making it visible by labelling or segregating it, however good that may be, doesn't get rid of it.

Is all of this just covert language advocacy, supporting your language(s) of choice and its way of doing things and critiquing those of PvR's language(s) of choice?

Perhaps a valid discussion to have, but not really the perspective I'm coming from in this discussion.

I updated my post (the one you replied to) with a link to his argument.

Ah yes, I remember that thread well. Interesting evidence that I haven't always seen eye to eye with PvR. ;-)

The particular problem he gives is, I think a good one: you have to instrument parts of a process without making too many non-local changes to the call chain. Think for example if you don't actually control the source for some of the steps in the chain (e.g. you are using a 3rd party library.)

Would your solution be "I can't do that; it's evil"?

### I'm getting the feeling that

I'm getting the feeling that you (and maybe naasking too) are thinking about all these issues from a language-specific viewpoint.

That's probably a fair conclusion. Languages are a powerful descriptive and reasoning tool, and I think that everything can be described by some language. Ultimately then, we are interested in languages that provide the most general, most powerful, and most concise tools to describe and reason about language terms.

Solving some problems does require state, and making it visible by labelling or segregating it, however good that may be, doesn't get rid of it.

I don't think anyone's trying to get rid of state. Daniel and others are merely trying to make its presence explicit so that its effects can be reasoned about. Capabilities are useful for reasoning about the effects of programs as well, and this debate kicked off when I essentially suggested that preserving referential transparency was more effective than capabilities for such reasoning.

### 2 + 2 = _|_

I'm getting the feeling that you (and maybe naasking too) are thinking about all these issues from a language-specific viewpoint. You seem to be assuming language features that, though I won't disagree are nice to have, are not the only way to do things, and certainly don't describe the boundaries of the solution space for the given problems in any fundamental way.

No really, I didn't understood what was the issue about the generator. I mean if it's supposed to be used inside a particular node of a network to generate ids that must be unique within itself, if it generates ids that should be unique to a set of notes or whatever. Each case has different problems, the ones which should be internal are different than those that should be external. What I said about tagging the network interface is that f :: a -> b is a local operation and f :: a -> Net b is a remote operation, without being able to disguise a remote as a local one to achieve "network transparency". For a concrete example in the app I'm working on I generate unique ids by using SHA256 of the content I'm identifying, as I don't want to rely on guid and I need to be able to consistently obtain the same id on different nodes of my app. If I was issuing ids to a network of (possibly untrustful) nodes I would just cryptograph some tuple of information to be able to generate unguessable keys. But I don't see how those relate to the question

Solving some problems does require state, and making it visible by labelling or segregating it, however good that may be, doesn't get rid of it.

Is all of this just covert language advocacy, supporting your language(s) of choice and its way of doing things and critiquing those of PvR's language(s) of choice?

I'm not arguing that state is useful or necessary, again I'm against invisible effects. My issue is with just hand-waving this discussion as already solved. I see this discussion as an important one to have, today I'm less sure of my position that I was years ago when I argued with PVR about it on the pragprog list, IMO it's [effect tracking] an interesting PLT problem.

The particular problem he gives is, I think a good one: you have to instrument parts of a process without making too many non-local changes to the call chain. Think for example if you don't actually control the source for some of the steps in the chain (e.g. you are using a 3rd party library.)

Would your solution be "I can't do that; it's evil"?

Let's turn this problem around, if I'm the third party library author I have some contract to fulfill to my clients and a contract I expect my suppliers to hold. If my code calls a library in a way that isn't exception safe (e.g. I open a file, call the library, get some result then enter a try/finally block to use both and close the file) I depend on the contract that the library won't throw any exceptions, on what control effects the library raises. Perhaps my type system helps me keep track of what I open and close (e.g. uniqueness types), so the type system also assumes no effects. If I'm working in a machine with tight resources my language may keep track of space/time, so when I call some library I know that it'll use n words of memory and m clock ticks (or whatever), otherwise I exhaust the resources or miss deadlines: I depend on which time/space effects caused. Most of the time I don't depend on those, so I can just say that whatever effects the libraries cause I cause too. I'm looking for a solution to this where I can say how much I require from the code I use and how much I ensure so my clients know how well I play with my dependencies and how well I work. This needs to be parametric, so I can say map :: (a -> b ^ c) -> [a] -> [b] ^ [c] (using a^c to mean that the evaluating a causes the effect c) and don't need to worry about what kind of effects it raises. In part this is similar to capabilities, because I will either create the code I need to use (therefore ensuring it causes only the effects it does) or receive it from my client as parameters/capabilities, so I'm subjected to whatever effects are bundled with its operations. All of it using a parametric framework with some sort of subtyping-like mechanism will ensure that if I don't care about some effects I'm parametric about those (which are free theorems). With some more pseudo-code

module a(b',c)
doA = (b' (instrumented c)).doB
instrumented c = ...
module b(c)
doB n = c.doC n + c.doC n
module c
doC n = n * 2

I believe that such kind of explicit passing of references and tracking of effects would help to improve programming. This is different from current approaches, because we let effects happen and don't sequence them using monads or unique types, but some sort of semantics extension (e.g. do (io, left to right) {...code goes here...}). But I still only have some pieces of this puzzle, I don't even know if this approach is good or necessary, but I don't think the discussion ended and invisible effects are a feature.

Today I think that the problem PVR posed is great and needs to be solved, but in a way that don't break any code. With invisible effects the code may break silently (the effects caused may harm any of the intermediate libraries or may mix in unfortunated ways with other effects, like unwind-protected and call/cc do) so I think this is not a solution but a temporary patch. Now I don't have a solution to this (but I'm working on it ;) but I recognize that there are some hard problems that need to be tackled and calling this a non-issue is harmful. OTOH as we wait for the correct solution we just put the damn counter in there and test it until we are confident we did't break anything.

Just to be clear, we need to be careful when we say "we can solve this problem with X" to be sure that we don't introduce new, harder, problems with it. I'm don't believe that any of the available solutions today are good. We need to keep the discussion going until we do have a solution. Perhaps the solution is stop trying to use classical logic with software (and have a correct/wrong binary value) and use a bayesian approach to correctness (so far the evidence adds up to 99.95% that this works), so we could ditch static type systems and use some sort of reflexive evidence gathering approach. But the discussion must go on.

[on edit: fixed a few typos]

### Re: The discussion must go on.

while i am only slowly/barely/notreally keeping up with the thread, i'd like to encourage anybody who is trying to question assumptions.

### In an ideal world...

For those of us who don't have the legacy issue to worry about, the solution is likely akin to universal dependency injection: code that's polymorphic in the environment it's run in, so that you can run it in a stateful environment with the back channels in place when you need to. In Haskell this tends to mean monads, and taking computations instead of functions as parameters where appropriate. It's already an idiom for error-handling, though partly due to a historical accident that I and many others have had cause to complain about in the past!

### How can effects be both invisible and visible?

With invisible effects the code may break silently

Yet sometimes the effects have to be invisible (which is my point). How do we solve this? I think one solution is for a component (or function) to have several interfaces. One interface (the functionality) doesn't change when state is added (like the memoization example). The second interface *does* change and through it we can see how the function uses state internally.

### I propose we agree to disagree

Yet sometimes the effects have to be invisible (which is my point).

I disagree with this premise. If the language doesn't support static checking of effects then we have no other choice, but the program has a different semantics depending on which effects it causes. As I wrote in another post, all these are semantically different: f :: a -> b, f :: a -> IO b, f :: IO (a -> b), f :: IO (a -> IO b), providing a common interface to them blur the semantic differences. If I call a function expecting it to have one of those semantics and it happens to be one of the others my program will have a different meaning than the one I expected. What we need is a way to signal that a code is parametric on the effects of it's suppliers so when it's module is instantiated we can bind to suppliers with the desired effects and those will just propagate to its client. In pseudo-code:

module B(C :: {Effect E; doC :: Int -[E]-> Int})
doB :: Int -[E * E]-> Int
doB n = C.doC n + C.doC n

Of course the current state of art doesn't have a clean solution to do this, but this don't make the problem disappear or make the current hacks become correct solutions: the invisible effects change the meaning of the program and most often than not (IME) it causes errors when the hidden effects interact with other effects. I agree that simple things like instrumentation are mostly harmless, but even things like memoization can get tricky (when we need some policy to remove unused entries and have things like weak references and friends) and if the language allows hidden effects the cat is out of the box and effects can be everywhere. That is the worst thing with such solutions, the signal (i.e. where the effects are) gets lost in a sea of noise (i.e. every function is a potential source of effects).

### Sounds sensible

to me, on the face of it, fwiw. i certainly agree that plenty of bugs come from people not doing a good enough job making sure effects are done right. any thoughts on how compact / automatic the process of writing out the types/kinds/annotations for the effects could be? if it becomes a ton of new ASCII that the programmer has to write it could easily be seen as too onerous. what is the closest one can come with available systems today?

[edit: is dynamic binding in Haskell at all useful? er, i guess i was looking at that because i thought it was related to state and modularity's problem statement.]

### Invisible effects only under semantic equivalence

Yet sometimes the effects have to be invisible (which is my point).

I think Daniel's point is that effects are only invisible if they do not affect the semantics of the program. Even changing the time and space behaviour of a module, as with memoization, can be an undesirable semantic change (think: languages that provide control over resources). Without being able to reason about these effects, we are reduced to statistical trial and error to reason about program behaviour.

exactly

### Invisibility + reasoning: it's possible to have both

Without being able to reason about these effects, we are reduced to statistical trial and error to reason about program behaviour.

I never said that one shouldn't be able to reason about the effects. In my last reply, I propose a solution that allows one to have both the invisibility *and* to reason about the effects.

Wat baten kaars en bril als den uil niet zien en wil?

### Reasoning globally

Module A depends on B which depends on C. If we add effects to C that are invisible to B how do we know that B will still function correctly? I agree that we can reason about C as a unit, but it invalidates our reasoning about B (e.g. it relied C's purity). If B is parametric on C's effects then we have no problems, but if requires that certain effects don't happen, then we have a problem. If B has effects of it's own, also hidden from A, neither A nor C are able to know if C's effects are safe when mixed with B anymore.

### Why effects but not other properties?

There are plenty of ways to screw up B by changing C without changing the types of C, unless the behavior of C is completely described by its types (we already have such a description: C's code). Can you explain why you think that effects should be mentioned in the types, while other things that affect the behavior of the program should not?

### .

The original problem was changing C to add some sort of effects that would be invisible from B but exposed to A via another interface. Also if the types are unchanged and no effects are added B can't be broken by any possible changes in C. That's different from saying that it's behavior won't change.

### What's the difference?

What's the difference? I understand that the change of behavior could be intended. Isn't it possible that something in C is changed that violates B's assumptions? For example an assumption about the return value of a function in C (e.g. that a sort function is a stable sort), or about the time complexity. From an end-user perspective the software would be broken. Does broken have another meaning in the context of programming languages?

### .

For example an assumption about the return value of a function in C (e.g. that a sort function is a stable sort), or about the time complexity.

Within a pure system the types are the allowed assumptions. If the type system is powerful enough to express a stable sort, you can only rely on it if the function provided specified it. If you rely on assumptions that the type system doesn't provide you're on your own, like depending on invisible side-effects. For example if B depends on a stable sort then either C must give some proof of it (e.g. the type system specifies the stable sort) or it must accept whatever C gives it and do the checks itself. It's important to separate what we think our program do and what's its semantics. Using the stable sort example if C provides sort :: [a] -> [a] we may think that the resulting list is sorted, but the semantic also includes the cases where sort is the identity function, the infinite possible constant functions, etc., so we can't be sure of anything other than it being a list of a.

Looking at time and space from a semantical perspective we can see those as effects too, so in a more rigorous setting we should treat them as we treat other effects (if we want to reason about them). As usually we don't care about space or time (but we can and should IMO) we don't do such analysis or just do them outside of the language.

Does broken have another meaning in the context of programming languages?

If we reason about B in isolation we can define its semantics IRT the type system and the language semantics. The ability to reason about modules in isolation is essential to software development, otherwise we can't handle the resulting complexity of whole world reasoning. If we reason about B using the interface provided by C, but this reasoning doesn't hold when we use a different (but conforming) implementation of C, then our reasoning is wrong (i.e. broken). Actually the reasoning was broken because the semantics accepts too much (e.g. in Java any piece of code can throw unchecked exceptions so it's the semantic of every program that it can either do what it should do or end with an exception, in Haskell every computation can diverge, so divergence is a valid result of all computations). There are some techniques to reason about code even when the semantics accept undesired results (e.g. this paper) but we really can do as much as the language semantics let us do.

The essence of this argument is that we can't reason about what we don't know and if the language allows unknowable things then we can't reason about anything at all, other than running the program under a controlled interpreter and seeing the results.

### How much detail in the types?

So the things you can know about C is C's type. For example if C has a sort :: [a] -> [a] we cannot know that the list sort returns is sorted. You can do this with a more powerful type system, e.g. sort :: [a] -> SortedList a. But this type says nothing about its time complexity. Maybe sort just tries all permutations of [a]. sort should have type sort :: List a n -> SortedList a n, O(n log n). This still does not tell us everything about sort. To know everything we have to encode the implementation of sort in its type. This doesn't seem right to me: the reason we have modules at all is to hide complexity (so it's good that some things are unknowable). What's the right way to think about this? How much detail should the type provide? Multiple levels of detail, i.e. multiple types for different kinds of reasoning; one that says that the list is sorted, another that the sort is stable, another that it will run in O(n log n) time?

### interesting question

sure, rope is useful, but one can tie oneself up in it if not careful :-)

### Struggling with multiple levels of detail

I've been dealing with this problem myself recently. I've come to the opinion that keeping a "database" of properties in the type is the right approach.

An example for a "split" function that I wrote eventually ended up with the following types. They types were added by accretion. I never set out to put so much in the type but I kept needing different aspects or combinations of aspects for further lemmas/theorems or further strongly specified functions. I found it to be easier to throw everything in, and throw out what I didn't need later.

(* (CharIn c s) means that c is in s *)
Fixpoint CharIn (c:ascii) (s:string) {struct s} : Prop :=
match s with
| EmptyString => False
| String c' s' => c = c' \/ CharIn c s'
end.

(* Every char in s satisfies f, or s is empty *)
Definition all_sat (f : ascii -> bool) (s : string) :=
(s = "" \/ ~s = "" /\ forall c, CharIn c s -> f c = true).

(* (terminates f s) means that s is either empty, or the first char
does not satisfy f.  It is a string that as the second part,
terminates a split. *)
Definition terminates (f : ascii -> bool) (s : string) : Prop :=
match s with
| EmptyString => True
| String c s => match bool_dec (f c) true with
| left Htrue => False
| right Hfalse => True
end
end.

(* the splitl function *)
Definition splitl : forall (f : ascii -> bool) (s:string),
{ p : string * string | (fst p) ++ (snd p) = s /\ all_sat f (fst p) /\ terminates f (snd p)}.

I'm not including the actual implementation of splitl, but you can see from the type that it is specified to give a pair which when appended are the original string, every single element of the first string satisfies f and the second string is either empty, or the first element doesn't satisfy the function.

So we ended up with a conjunction of properties that are all satisfied. You are free to throw out the parts you don't care of, or throw the entire property proof away easily and just use the element by destructuring the sigma type { x: T | P x }.

### That's a very nice

That's a very nice specification. Do you prove that the implementation of splitl behaves as specified or is this specification accepted as true?

It feels like the distinction between the code and the type becomes smaller as the specification becomes more precise. The specification is more and more like the code itself. You could say that the code is the most precise specification possible, but it is often not very useful as a specification. For example, it probably isn't entirely obvious that (fst p) ++ (snd p) = s holds if you read the code of splitl. Therefore it seems to me that it's sometimes desirable to have a weak specification like (fst p) ++ (snd p) = s instead of a full specification, like the code of the function. If you're trying to determine how fast a function runs it's more useful to have its time specification, like O(n), than to have its implementation, even though you could deduce the O(n) time from its implementation.

-------

Which language are you using? I am able to understand it except this part: bool_dec (f c) true, in the terminates definition.

Why isn't the match statement like this:

match f c with
| True => False
| False => True
end

-or-

not (f c)

-------

Most times you are interested in relationships between functions rather than types of values. For example if you have an implementation of pairs you are interested in:

first (pair a b) = a
second (pair a b) = b

Rather than the types of the values produced by pair, first and second. And for dictionaries you want to know that:

d = some_dict
lookup (insert (x, y) d) x = y

### Back to Coq et. al.

It feels like the distinction between the code and the type becomes smaller as the specification becomes more precise. The specification is more and more like the code itself.

so once again we see that we should all be writing everything in Coq?

### Strong Specification

The language of the example is Coq. The specification has a proof of correctness associated, which I didn't include because it required several lemmata which didn't seem relevant. The complete code is at:
Substring.v.

Decidable boolean equality is implemented as a library in Coq and it includes a proof of the truth of the statement, not just which computational branch to take, which makes it a stronger specification than a normal equality/disequality function would have. From the Coq library:

Lemma bool_dec : forall b1 b2 : bool, {b1 = b2} + {b1 <> b2}.

So we can in all cases decide if two booleans are the same or different. This is provably terminating and gives us power in later proofs, since we can use use the proofs that were necessary to produce this type as witnesses to more complex types later. In the case you are looking at though, I think you are correct in that it doesn't make any difference and a normal boolean would be sufficient as the proof (Htrue/Hfalse) is never used.

### Monads can not implement true mutable state

Monadic state does not have the modularity properties of true mutable state. This has already been discussed on LtU. See also section 4.8 of CTM.

I don't have a copy of CTM - can you (or anyone) provide a link to the previous discussion being referenced here? Is this just a reference to the fact that you can't e.g. provide a memoizing implementation for a value of pure type?

Mutable state is always going to be an essential part of any computation machine, but it seems to me that the language we use to encode and discuss stateful behavior should be referentially transparent so that we can describe and reason about the state change (or concurrency) from within the language.

I don't have a copy of CTM - can you (or anyone) provide a link to the previous discussion being referenced here?

I believe the first such discussion (on LtU) was State and modularity. I think the subject has also come up in comments in later threads, though.

### Thanks

I agree with Frank Atanassow's comment in that thread.

### State is far from

State is far from inextricably bundled in the IO monad - there's just one particular form of reference that's found there, and you can almost always use STRefs instead there, as in this ghci session:

Prelude Control.Monad.ST Data.STRef> r <- stToIO $newSTRef "foo" Prelude Control.Monad.ST Data.STRef> stToIO$ readSTRef r
"foo"

### Capabilities and RT

The above discussion is interesting, but there is at least one obvious security concern which isn't addressed well by RT (and may be hindered by it)--preventing an attacker from reading data which he is not authorized to see.

Such an attacker typically will not want to mutate anything--ideally (for them) they will leave zero traces of their presence.

Capabilities, or other means of enforcing modularity, can mitigate this sort of attacks by a) denying access to information to those who lack the appropriate credentials, and b) ensuring that (authorized) access to sensitive information is logged for subsequent audit or forensic analysis. Note that in the latter case, you want an action which is normally side-effect-free to have a side effect (a record of the access).

### Capabilities are unforgeable

Capabilities are unforgeable descriptors providing access to an object. Lambda names in the lambda-calculus are also unforgeable references to objects. Thus, lambda names are capabilities, and protection from reading is achieved by withholding the lambda designating the secret, just as protection from reading is achieved by withholding the capability designating the secret.

Hopefully, my clarification below explains this sufficiently.

### What is capability security and how can it be compared to RT?

I think there is some confusion in this debate, including in my own comments. Upon posting the link to this debate on cap-talk, David-Sarah Hopwood wrote back to agree with PVR, with the argument that RT does not imply encapsulation, where capability security requires it. I found this statement somewhat confusing, and it resulted in the 'aha!' moment where I realized how we are all talking past each other.

I clarified my position in a reply, and I just posted a follow-up to explain why the distinction I am making is applicable. I'll reproduce this last reply here:

Sandro Magi wrote:
> When speaking of referential transparency, I am of course speaking of it
> in terms of programming languages, and the lambda calculus at a minimum.
> A lambda is encapsulated.
>
> In Section 9.1, footnote 1 of MarkM's thesis, he defines the
> "object-capability security model" as:
>
> "Our object-capability model is essentially the untyped call-by-value
> lambda calculus with applicative-order local side effects [...]"
>
> Preserving referential transparency eliminates the dependency on
> evaluation order, while preserving all of the language's other
> properties that I can see. There might be a small hit to expressiveness
> due to more tightly controlled side-effects, but I think this is largely
> compensated for by the increased reasoning power over effects.

Upon further thought, this requires further elaboration. Progress is made when building on existing knowledge, and this process presumably results in fine-grained distinctions which are important to deep understanding of a subject.

When considering capabilities from a language perspective, there are not as many novel properties as there are from the OS perspective. As I quoted above [1], the object-capability model is effectively the lambda-calculus. The novelties of object-capabilities in this setting aren't protected, unforgeable descriptors, since the lambda calculus already has lambda names. The "reference graph = access graph" equivalency is already present.

The novelty of capabilities for programming languages is thus in *controlling access to effects* via the reference graph. From this perspective, techniques to preserve referential transparency in the lambda calculus and capability security restrictions to languages built on the lambda calculus to control "world" effects are directly comparable.

MarkM's thesis [1] distinguishes "primitives", which are internal to the language, and "devices", which are external. Most languages provide ambient access to devices resulting in all the current problems in isolating subsystems.

Both capabilities and RT techniques impose restrictions on the use of effects which affect the "world", ie. via devices. In Haskell, this distinction is not very fine-grained at the moment, as most effects simply live in the "IO" monad, but there is no reason why these effects couldn't be made more fine-grained, ie. IO => DelDir, SendPacket, etc.

When combined with types, RT is a powerful reasoning tool. RT further tends to increase modularity and reuse, which helps with code correctness, so there are compelling software engineering benefits as well.

### How to derive capability security from RT

On cap-talk, Kevin Reid pointed out that confining effects via a monad like IO merely enforces a static access control policy. I think the reason for this is that Haskell is somewhat sloppy here. I'll produce an expanded version of that reply here, since it concerns language design.

If I understand it correctly, the type of Haskell's 'main':

main :: IO ()

This signature implies an naive ontology where 'main' is the sole program in all existence. Of course, this is clearly not true, as 'main' is launched by another program, which in turn was itself launched by a program, and so on and so forth. This signature for a program would not be viable if we were all running Haskell-based operating systems for instance (please correct me if I'm wrong here). Thus, 'main' executes within a pre-existing World, inducing a particular side-effect, and thus the signature should be something like:

main :: World -> IO ()

Where World is a fully abstract signature provided by the runtime. The runtime can bind FFI functions, like openFile, via World:

openFile :: World -> FilePath -> IO Handle

Since World is simply a signature, a function could conceivably build another World' conforming to the World signature, and run a new program within World'.

And thus we have achieved capability security. Relying on a single World type is less than ideal, which is why I still think designing a proper FFI is still something of an open problem, but it's sufficient for capability security.

Since World is simply a signature, a function could conceivably build another World' conforming to the World signature, and run a new program within World'.

To pick up from earlier in the thread: what if there is not one authoritative world you working with? This is the normal situation in a distributed application (which in real life most applications are).

In such a case, RT is not reliably possible across the "collection of worlds", and so capabilities are necessarily something different from RT.

### In such a case, RT is not

In such a case, RT is not reliably possible across the "collection of worlds", and so capabilities are necessarily something different from RT.

There is always a single underlying World, otherwise no communication channel could exist between them. This channel between World' and World'' is built over the common World.

[Edit: I think it's clear that the underlying physical world is the canonical World, and all World'' inevitably make upcalls to their parent World' which eventually bottoms out at the physical World. Hopefully, that clarifies things.]

### When is a world not a World?

There is always a single underlying World, otherwise no communication channel could exist between them. This channel between World' and World'' is built over the common World.

I think we are getting closer to the source of our different perspectives.

Normally, when we think about RT, we are implicitly considering some World which is sufficiently specified and accessible so that we can make the RT guarantee.

To continue with your terminology, normally how we do this is by defining a parent World that makes a contract with our RT world to handle all the messinesses of the physical world, and only present them locally in our world as though they too are RT. (This is one way to think about how monads provide impurity in a pure world.)

Now in a truly peer-to-peer distributed case, the parent World is almost purely conceptual, since there is no canonical representation of the shared World all in one place that can make an RT contract with its children.

And so inevitably, either your RT world has to stick its fingers in its ears and pretend it can't hear anything that isn't RT, which strongly limits how much of the parent World it can see, or it needs to be realistic and skeptical about its parent world and use a more rigorous implementation of capabilities to communicate with it than the simple RT discipline it uses internally.

### I don't think I follow

I don't think I follow your argument. From my understanding, a monad can be conceptualized as encapsulating the hidden state of the World, such that all computations done within the monad simply sample the current state of the World (outside), and transition the local (inner) World'(1) to state World'(2).

Obviously there are limits to our ability to see the World, and these limits are reflected in the APIs that accept a World.

The canonical representation of World in a p2p distributed case, sufficient for our purposes at least, is the network connecting the two peers. Are you suggesting that a monadic interface, or perhaps some type and effect system, cannot adequately model this situation?

### The world is a vampire

and transition the local (inner) World'(1) to state World'(2).

This again assumes that your monad system has a complete and authoritative representational version of the current World somewhere.

The canonical representation of World in a p2p distributed case, sufficient for our purposes at least, is the network connecting the two peers

And how in general does that network ensure the RT contract?

Are you suggesting that a monadic interface, or perhaps some type and effect system, cannot adequately model this situation?

For some realistic values of "this situation" and "adequately model", yes.

Which comes back to the original point: that there are situations where capabilities cannot be subsumed by RT, making them an independent concept with their own assumptions, uses and guarantees.

### This again assumes that your

This again assumes that your monad system has a complete and authoritative representational version of the current World somewhere.

I don't understand. The FFI gives you as much access to the outside world as is accessible via programmable sensors and devices. Are we talking about two completely different things?

And how in general does that network ensure the RT contract?

Let's try a different tack: Haskell right now supports concurrent programming over channels via monads. How does this differ from the p2p scenario you proposed?

### Nitpick

It's not main that should take a World, it's a hypothetical runIO function. The reason for this being that main really is just a designated entry point and is otherwise not special in any way whatsoever. Your scheme would (hopefully!) prevent recursive calls to main.

### Confused

The signature 'World -> IO()' doesn't make much sense to me. Maybe World is just poorly named, but the whole point of returning an IO monad is to hide the world. Also, I don't think the problem you're trying to solve here actually exists. What does that signature get you that adding 'getWorld :: IO World' wouldn't?

More to the point, why have a World type at all? The runtime system that runs main is going to run it in a context - there is no requirement that the "callbacks" bound in a monad will always deterministically get the same results. That's the whole point :).

If you want to have fine grained dynamic capabilities, put them in values. File handles are a simple example of that. RT makes sure those don't leak. As always you can do things to get better static static guarantees in the types, but ultimately you're going to want a dynamic security model. ie. deleteFile :: String -> IO () cannot be checked statically - you need to know what file that string refers to at runtime.

Regarding networking, if we're talking about running code safely across several trusted machines (with trusted runtimes), then exposing values like file handles in a machine independent way seems like perfectly reasonable magic to provide under the covers. You can even provide a monadic way to ask if you're dealing with a remote object or a local one - I don't see a problem there. On the other hand, RT has very little to do with the case of networking with untrusted parties.

### What does that signature get

What does that signature get you that adding 'getWorld :: IO World' wouldn't?

World is no longer an ambient authority. Your getWorld exposes an ambient authority. Maybe I didn't explain the new signature properly: 'main' with the new signature is now the entry point of the program.

File handles are a simple example of that. RT makes sure those don't leak.

And yet, why can't RT be used to ensure that authority doesn't leak? I'm placing authority to access the outside World in a value just like file handles.

As always you can do things to get better static static guarantees in the types, but ultimately you're going to want a dynamic security model. ie. deleteFile :: String -> IO () cannot be checked statically - you need to know what file that string refers to at runtime.

The point of World is that deleteFile is not invokable ambiently. See the cap-talk post I linked to above for the advantages that World brings here. In summary: World can effectively encapsulate a chroot. You hint that there may be some other way that maybe I'm not getting:

The runtime system that runs main is going to run it in a context - there is no requirement that the "callbacks" bound in a monad will always deterministically get the same results. That's the whole point :).

And where is this runtime system defined? The point of reifying World as a value, is that the runtime is now accessible from within the language itself, and can be rebound, rather than as some mysterious, ambient primitive. I tried to provide an ontological argument for why the ambient runtime doesn't make sense in an RT language, but perhaps it's not as convincing to others as it is to me. :-)

I have more to say on mutually trusted runtimes, etc. but I'll stick to the core contentious issue for now. :-)

### Abandoning the World

I think I understand what you're trying to do, and I wouldn't do it with a World type. I think it distracts from the issue. The problem today is that Haskell's entry point for every program has the same type (IO ()), which grants way too much authority to most programs. Rather than trying to fix this by adding a World type (and I confess along with others, that I don't really see how that solves it), you'd like to be able to specify a more restrictive type for the program's entry point, right? Then you put the onus the OS/runtime to understand how to satisfy the obligations encoded in that type (which might include things like rights to open a network socket on a particular port, read files inside some root directory, and so on).

As you said, you can think of the problem as being in part due to the fact that the OS doesn't know anything about the program, so all entry points must have the same very general type. How would you fix this if the OS itself were a larger Haskell program? Just as above (and in the recent Swierstra pearl), you'd use a more granular IO type, not a World type.

### I agree I'd like to see a

I agree I'd like to see a type reflected on main that reflects that a program uses networking and openGL, say, but there's still a level of dynamic security that you can't get this way. Like if you want to take an arbitrary IO () off the network and grant it file access, but restricted to a particular directory.

### Then you put the onus the

Then you put the onus the OS/runtime to understand how to satisfy the obligations encoded in that type [...] Just as above (and in the recent Swierstra pearl), you'd use a more granular IO type, not a World type.

I agree that World is not an ideal situation. I said as much in my original post. I just don't have a better structure yet. This structure will have to satisfy secure FFIs. But I disagree with one point: using World IS putting the onus on the runtime. I'll explain, and hopefully clarify how World achieves more granular authority.

More granular monadic types only gets you so far. Let's assume we have separate types for files and network namespaces (I'm not familiar with the real Haskell types, so I'm just guessing):

type FileSpace  -- assume both are kept abstract
type IpSpace

openFile :: FileSpace -> FilePath -> Maybe IO Handle

main :: FileSpace -> IpSpace -> IO ()

This is similar to the World value, except more granular. Do you agree?

Now you want to add some new namespace for a feature that wasn't previously designed for, ie. linked in via the FFI. How would you support that if the entry point must have a well-defined structure? Here's how I sort of envision this working using World:

type World
type FileSpace

root :: World -> FileSpace   --FFI call
openFile :: FileSpace -> FilePath -> Maybe IO Handle
openDir :: FileSpace -> FilePath -> Maybe Directory
chroot :: Directory -> FileSpace

main :: World -> IO ()

In capability terms, World is a closely-held authority, ie. it is held only by a trusted core of code and only derivative rights are doled out as needed (like FileSystem). I know I used World in the original openFile definition, but this structure reflects more closely what I have in mind.

A similar structure would exist for IP addresses, and other functions that interact with the world. Anything accepting a World parameter eventually bottoms out to an FFI call. The runtime maintains a map of (function, int) pairs. The int indexes into a blob the runtime uses to store the FFI data for the module, and this blob is passed to the module's function when it calls into it.

The runtime plays the intermediary for all World calls. I would love a more granular design, I just haven't been able to come up with one just yet. I just started considering this issue though, so I'm sure there are better designs. Something involving Dynamics might be able to bind the appropriate types to an entry point with no well-defined structure.

### World is no longer an

World is no longer an ambient authority. Your getWorld exposes an ambient authority. Maybe I didn't explain the new signature properly: 'main' with the new signature is now the entry point of the program.

Ok I think understand what you're proposing now - you're going to provide some function like restrictPermissions :: ... -> World -> World that lets you build new restricted worlds. I think World is a bad name for this -- something like IOPermissions would be better I think.

But IMO there's also a better way to do this. Leave everything the way it is now, and provide withPermissions :: ... -> IO () -> IO (), that runs a set of operations with a restricted set of permissions.

### I think World is a bad name

I think World is a bad name for this -- something like IOPermissions would be better I think.

Perhaps. But it's not just IOPermissions, it's the visibility of anything from a program. Effectively, World totally defines the world in which the program runs. Perhaps Environment is better? Namespace might be another possibility. All IO, all namespaces, everything visible to the program, that is not defined within the program, is eventually accessed via World. World thus seems pretty appropriate to me. :-)

But I certainly welcome a more granular design, or a more appropriate name.

.Leave everything the way it is now, and provide withPermissions :: ... -> IO () -> IO (), that runs a set of operations with a restricted set of permissions.

I'm not clear on how this would work. What parameters are passed in the '...'? How does the runtime[1] restrict the permissions in the returned IO ()?

In any case, I'm not sure I'd have confidence in a system built on unsafe ambient authorities, with the ability to restrict them for safety, over a system for which authority is constructed from safe initial conditions. 'Fail open' is the wrong default. Or am I missing something?

[1] I'm assuming it's the runtime that will do this.

### The code you just posted in

The code you just posted in another part of the thread gave me a much better understanding of what you're envisioning - I didn't look at the code you linked to closely enough. The term "World" is fine - it just needs to be understood as "World Handle" in the same way that "File" is understood as "File Handle".

My withPermissions was meant to be a template function restricting whatever it was you were trying to restrict. So for example, in this case for which you provided sample code, you'd maybe have withRoot :: Directory -> IO () -> IO (). Of course if you want to be able to pass in multiple FileSpaces, that doesn't work.

Anyway, I think what you're talking about is pretty reasonable. I'm not (yet) convinced it's the best way to solve this problem in a new language, but it looks like a pretty good way to go about it in Haskell.

### I'm not (yet) convinced it's

I'm not (yet) convinced it's the best way to solve this problem in a new language, but it looks like a pretty good way to go about it in Haskell.

The new language issue is the one I'm most interested in. :-)

Do you have some other approach in mind that could provide the same dynamic and static guarantees? Capabilities are good for reasoning about dynamic security policies, and monads are at least good for basic static ones. I suppose you could characterize my approach as trying to make monads into dynamic capabilities by removing, what I think are, their implicit ambient authorities.

### Well

I do have another approach in mind, but I haven't worked all the details out yet. I do think there are a number of reasons to prefer having capabilities determined by the context in which a monad is interpreted rather than having them passed around as values and hardcoded into the monad at construction time.

### I do think there are a

I do think there are a number of reasons to prefer having capabilities determined by the context in which a monad is interpreted rather than having them passed around as values and hardcoded into the monad at construction time.

If I understand correctly, you mean the capabilities are somehow looked up in the environment of the interpreter, or somehow provided by the interpreter, rather than the capabilities being embedded in the program constructed by openFile and friends. If so, can you give me an example of an advantage to this approach?

I think there could be an advantage here, but only if the environment of the interpreter could be constructed with only the authorities required.

I do have another approach in mind, but I haven't worked all the details out yet.

Please post here if you do figure it out! I for one will read with interest. :-)

### World vs. world

Saying main :: World -> IO () is weird because one possible implementation of IO a is IO a = World -> (World, a) (i.e. the state monad using the world as a state space). So actually the IO already contains the world we only need to tame its ambient authority, namely introduce the possibility of failure (e.g. using Maybe) to the necessary places (e.g. openFile :: FilePath -> IO (Maybe Handle), openSocket :: NetAddr -> IO (Maybe Socket)) and a runWith :: Authority -> IO a -> IO a, so we can write:

main =
do foo
runWith (NoAuthority except (ReadFile "foo" & WriteFile "foo")) \$ bar
baz

With this we can ensure that we run both foo and baz with all authority given to the program, but bar can only access file "foo" and nothing else. With a decent algebra of authorities we can express any constraints we want. If we don't care about exceptions we can just reuse the current functions and instead of using Maybe as a result we just throw authority exceptions (which are part of the IO Monad) and either ignore the violations under runWith or catch the exception explicitly.

The monadic bind gives all the authority it has to the next computation, which is against POLA. It is capability like (because pure code can't use the world because IO jails it), but the capability is too big. Another problem is that this capability (i.e. IO with runWith) doesn't express in its type what authority it confers, so we leave the realm of static type checking. We could play with this either adding a phantom type to the IO (i.e. IO a' a where a' is the authority required by the computation to produce a) and use a specialized monadic bind that propagated the constraints (i.e. (>>=) :: IO a' a -> (b -> IO b' b) -> IO (a' :&: b') b'). With inference it ensures that a IO computation can only be executed if the program has the necessary authority. Making such thing using Haskell's type system will probably lead to very cryptic error messages, but I think it would do what you want.

### This is basically what I was

This is basically what I was advocating a few posts ago (replace your runWith with my withPermissions). But I think his approach has some advantages in Haskell. To 'split up' IO, you'd presumably use typeclasses: doNetworking :: NetMonad io => io (). But with typeclasses you can't pass in multiple instances of the same class, so you couldn't pass in two file systems using this approach. (Edit: And the bigger problem is that instances are decided statically) I think his approach is a fairly pragmatic one with haskell the way it is.

Note that his World isn't the same thing as World# in the implementation of IO, which is why I said it was badly named - it's more like an IORef WorldRoot. That is, it's a handle to the World.

### Airmchair philosophy at high noon!

You might want to reify World as a value, but you don't want to make main special - what you want is a situation akin to State and runState, in which the World gets passed in when you run an IO computation. Amongst other things, this lets you supply a function along these lines: chworld :: (WorldCap -> WorldCap) -> IO a -> IO a

That type prevents you from leaking the World out anywhere. Adapting to finer-grained capabilities left as an exercise for the reader.

The runtime could always be made accessible via the IO monad and actually changing it was always going to be an effectful operation, so we should expect such things to have an IO type. If you really want to see the initial World get passed in you may have some trouble: running pure code from within the IO monad via return is fine because you've still got return as your 'interpreter', but running IO from some conceptual pure land 'around' it suggests the existance of much the kind of ambient runtime you're trying to avoid! Where exactly did the capability to do anything actually meaningful come from?

Someone else built our substrate system and in the first instance the capability to run our code isn't part of our code. As such, it's not ambient in the sense of ambient authority because we can't actually touch it any further than it gives us the capability to do so. This doesn't prevent various forms of self-modifying code - one of the things I always liked about the world-passing model is that at least theoretically the semantics don't break down if you call launchMissiles and nuke your own computing substrate. You get a world in which your substrate is now made of glass and no more evaluation is taking place. But our code doesn't subsume the laws of physics, the laws of physics subsume our running code.

### Why take apart IO?

As a quick added comment, there is a weakness present in any system where the capabilities aren't tied to the monad they're used with at the type level: it's possible to find out that something has more capabilities than you thought it did - to put it another way, you can't actually sandbox. That's pretty dangerous in any system with dynamic types present, as a capability can be smuggled out via one.

One way to look at this is that if you keep all your IO tied to a monad then that monad effectively holds a "use IO capability" capability. Splitting up that capability is also a good idea.

### We're definitely stretching

We're definitely stretching my abstract understanding of monads here, so please bear with me. :-)

I'm assuming your chworld function is intended to rebind the WorldCap contained within the given IO a. That might solve my issues, but I don't understand how it would work. It seems like you would need some extraordinary runtime support, and the complexity boggles my mind. I'm probably missing something obvious.

The runtime could always be made accessible via the IO monad and actually changing it was always going to be an effectful operation, so we should expect such things to have an IO type.

Hmm, I'm not always trying to change my runtime, but when launching a subprogram, I may opt to give it a subset of the runtime to which I have access. In other words, I would delegate only the authorities which I feel the subprogram needs. A subset of my World.

If I am trying to change the runtime, I think I agree with what you said.

If you really want to see the initial World get passed in you may have some trouble

I don't care much whether World is passed in explicitly, so long as I can reason about and delegate authority easily in as fine-grained a manner as I like, just like I can reason about and delegate values in a functional language. That's the ultimate goal in my mind.

I currently happen to think that RT is the ticket here, since all values within the language already follow capability discipline, side-effects are to follow. Only FFI-based functions like openFile which cause such effects give me pause.

Such authority in Haskell is not managed in a fashion that I currently recognize as being amenable to such fine-grained delegation. That's the limitation I'm trying to address. Whether it's a limitation of my understanding or Haskell I don't know. :-)

I look at the signature for 'main':

main :: IO ()

and I see an IO program. I believe that is a valid interpretation of IO: a program containing imperative statements which are executed by an interpreter. What I don't see is the input to this program, or the input to the interpreter. Where does the IO program or the interpreter get its authority to invoke openFile? To me, it looks like openFile coalesces a Handle from the ether. The hidden structure performing the mapping from FilePath to Handle is everywhere and nowhere. It's ambient. Ambient authority like this is the anti-thesis of capability security.

If I were designing a Haskell operating system, and I was at the point where I needed to define the signature of the standard entry point 'main' for a Haskell subprogram, where such a program should not be able to affect other subprograms running concurrently, unless given a channel by which to communicate with another subprogram, what would the signature of 'main' be? That's the ultimate question. I figured World -> IO () was an answer, where World did not contain any such channels. I'm probably wrong, but if someone gives me a right answer, I'll be happy. :-)

Like I said, I'm pushing the limits of my understanding of monads, so any clarification is much appreciated. If the data structure that openFile uses to look up the path is somehow "contained within IO", then something like your chworld which swaps IO's internals would probably suffice. I just don't yet understand how it works.

but running IO from some conceptual pure land 'around' it suggests the existance of much the kind of ambient runtime you're trying to avoid! Where exactly did the capability to do anything actually meaningful come from?

I feel there's some meaningful insight for me in this sentence that I haven't quite grasped. Could you perhaps clarify?

### You get to write the interpreter

I believe that is a valid interpretation of IO: a program containing imperative statements which are executed by an interpreter.

The top-level main program gets its authority from the fact that it's being interpreted at top level. Statements like 'runWith' or 'withPermissions' or 'chworld' would basically say 'interpret the following statements in a modified context'.

### The top-level main program

The top-level main program gets its authority from the fact that it's being interpreted at top level. Statements like 'runWith' or 'withPermissions' or 'chworld' would basically say 'interpret the following statements in a modified context'.

Ok, I think I understand. So I could also achieve my goal by defining a different interpreter, which chroots all openFile calls under a given root directory? I can do this for IP addresses, and all other dynamic authorities as well and it doesn't become unwieldy? And the last question: is this possible in Haskell, or only if the language were designed with this in mind?

### I don't think you'd

I don't think you'd generally want to write a different interpreter... you just modify the interpreter environment. Anywhere in your approach that you explicitly pass in a capability, you would just get the capability from the environment instead. I think you could do this in haskell now, but it would be clunky.

Anywhere in your approach that you explicitly pass in a capability, you would just get the capability from the environment instead.

I suppose it depends on what happens when the capability is not present, or must be granted only conditionally based on user input. How would you implement a Powerbox as I describe in another post using your approach?

### So I could also achieve my

So I could also achieve my goal by defining a different interpreter, which chroots all openFile calls under a given root directory?

Right. IO programs have an ambient authority, but we can delimit the use of a given authority with calls to that different interpreter. Which is probably really the same one with a parameter (its capabilities) swapped.

And the last question: is this possible in Haskell, or only if the language were designed with this in mind?

Once you've built your IO-like monad (an IOCapability monad with a runIOCapability function is probably your best bet for now, but then you know IOCapability code is capability secure even if IO code isn't), the rest's a matter of whether you can manipulate your collection of capabilities appropriately.

One way of working with this is to go off and work out what you'd do in your own language using whatever your favourite metalinguistic tools are to experiment, then sit down and encode it in Haskell. You'll probably spend more time picking how to encode something akin to extensible records than anything else though!

### If I were designing a

If I were designing a Haskell operating system, and I was at the point where I needed to define the signature of the standard entry point 'main' for a Haskell subprogram, where such a program should not be able to affect other subprograms running concurrently, unless given a channel by which to communicate with another subprogram, what would the signature of 'main' be? That's the ultimate question.

Rather than something like World -> IO (), I would look to encoding more information in the result type (basically, between IO and ()). For a very concrete example, consider this from the Swierstra paper:

cat :: FilePath -> Term (Teletype :+: ReadFileSystem) ()

This is the regular unix cat command. Now look:

main :: Term (Teletype :+: ReadFileSystem) ()
main = cat "foo/bar.txt"

Now main has a much richer type than IO (). In particular, it clearly states which capabilities must be provided by the runtime. In order to run this program, I have to grant Teletype and ReadFileSystem capabilities. By "grant" here, notice that I do not mean that we need to pass a dynamic value encoding the capability. I simply mean that the Haskell runtime needs to expect and tolerate a main value with this type. How I (the user, the OS, etc.) indicate to the runtime which capabilities I wish to delegate is of no importance here.

However, there is one obvious problem with this scheme. There's no way for a program to dynamically determine which capabilities it might need. It must declare all required capabilities up front, and the user must grant them all at once or not run the program at all. In other words, there's no way for the program to reach a particular point and say, "Whoops, now I need to write this file. Is that ok?" I don't know how other capability secure approaches handle this.

I feel there's some meaningful insight for me in this sentence that I haven't quite grasped. Could you perhaps clarify?

I think you're right, but I'll leave it to Philippa. ;)

### In order to run this

In order to run this program, I have to grant Teletype and ReadFileSystem capabilities.

ReadFileSystem is too broad an authority though. The structure of capability systems which I'm trying to replicate here, is that each program is run with its own private filesystem. In capability systems, each instance of a program runs with full read/write access to its own private directory. A 'scratch' space if you will. Just like each program runs in its own virtual memory space.

The fact that 'cat' requires only ReadFileSystem is an interesting property, but without being able to chroot the program, we haven't achieved a sufficient reduction in its authority. Now if "foo/bar.txt" could also be interpreted with respect to some root that the user, or a program running on the user's behalf could specify, and cat couldn't escape that root without a grant from the user, then that would suffice.

It must declare all required capabilities up front, and the user must grant them all at once or not run the program at all.

Yes, that is a downside, but still potentially useful. This would define the minimum capability set that the program needs to run, and dynamic authorities that are acquired and released while the program runs require another mechanism. My World technique is such a dynamic mechanism, though poorly specified at this point.

"Whoops, now I need to write this file. Is that ok?" I don't know how other capability secure approaches handle this.

I'll describe how, and you'll hopefully see the advantage of a parameter-passing implementation like World.

Consider how I defined openFile:

root :: World -> FileSystem
-- open a file mapped into our FileSystem
openFile :: FileSystem -> FilePath -> IO Maybe Handle
-- request a file be mapped at the given FilePath
requestFile :: FileSystem -> String -> FilePath -> FileSystem

I specified that World has no a priori concrete implementation that a program can instantiate. The concrete implementation is constructed by the runtime at program launch.

Capability systems grant access to files outside the program's scratch space, or any authorities the program does not have access to, via a Powerbox. The Powerbox is the only piece of trusted code that mediates communication between the user and a program. It ensures that all dialogs or output it displays is unspoofable, and distinguished from output from the program. See for instance, Marc Steigler's work in taming OCaml, and how he implemented a Powerbox for OCaml console applications.

So basically, the application can attempt to openFile on a file it needs to operate on, but if it doesn't find it, it invokes requestFile to request that a file be mapped into its local namespace. requestFile is an upcall to the user's Powerbox, and the Powerbox then displays an unspoofable dialog to the user in whatever environment the program is running in. If it's a console application, it displays the program's request for a file on the console, if it's a windowed application, a recognizable, unspoofable file selection window is opened.

Alternatively, one could even intercept all openFile requests if the file is not already mapped into the local namespace. This is the dynamic aspect which World/parameter passing enables. Note that this approach may be slightly different than other capability systems you may have read about. It simply requires the minimal number of changes to existing programs which assume ambient filesystem access.

I'll briefly describe the pure capability approach, so you can see the stark difference in achievable authority. Let's assume we're making 'cat' capability-secure. We have to slightly change the interface to do this. Instead of accepting a file path, ie.

cat "/path/to/file"

cat will now accept a file descriptor, ie.

cat < "/path/to/file"

The user's shell is the trusted Powerbox, which opens the file on the user's behalf, and launches cat passing it the file descriptor, and the descriptor to stdout. Here, cat runs with no authority at all except to read from the file, and write to stdout. File descriptors are capabilities.

If we want cat to instead output to another file, then we have perform the redirect ourselves, ie.

cat < "/path/to/input" > "/path/to/output"

The user's shell then replaces the stdout descriptor with the descriptor to the output file. No changes to cat are necessary, since cat deals only with descriptors. This reflects the dynamic security policies achievable with capabilities, and these patterns scale to larger and more sophisticated systems since we're dealing only with interfaces and parameter passing.

I think a similar approach is achievable in Haskell, except with an even greater degree of safety since we can reason about a program's effects in a more rigourous manner.

### I think the type of syntax

I think the type of syntax Matt Hellige just proposed is exactly the kind of thing you want, but with a different interpretation of the signature: The type isn't a guarantee to the monadic code that certain capabilities will be granted, it's a guarantee to the runtime that only certain capabilities will be requested. Then the runtime can impose whatever dynamic authority model it wants.

So for example, take your types like openFile :: FileSystem -> String -> IO () and encapsulate the FileSystem -> IO a part into an abstract monad so that FileSystem is reflected in the type but without a value that can be leaked. I still haven't gone through it carefully, but it looks like the paper Matt referenced shows that you could even get this to work smoothly(?) in Haskell right now.

### Actually, I don't think

Actually, I don't think there's much of a difference to the distinction I made in the first paragraph here. The point is that your capabilities should be failable. Teletype could be required to explicitly ask if a terminal is available before proceeding or an exception could be thrown if it isn't. The authority model is in dynamically determining what succeeds.

### Agreed, twice

I agree with Matt M's first reply, but I also agree with this second reply that it probably doesn't matter too much. You can handle dynamic capability requests this way, although at that point I guess you need to account for the possibility of failure (at places where authority might not be granted).

It's not right to think of a capability like ReadFileSystem as necessarily granting permission to read from a single canonical filesystem. Going back to the notion of "IO programs" and "IO interpreters", it's probably better to think of the capabilities as a specification of precisely what features the program requires from its interpreter. The interpreter can provide these features any way it seems fit (for instance, ReadFileSystem could easily be interepreted with respect to chroot).

I'm considering your requestFile function, and I believe I begin to see the point. Let me see if you agree with this: you need something like requestFile to dynamically grant/deny permissions (via the Powerbox). In an RT language, this function must transform the filesystem (capturing the result of the request for authority), and relevant operations, such as openFile, must be defined with respect to such a filesystem root. But now, why do we need World? Won't this work just as well?

root :: FileSystem -- no more World here.
-- open a file mapped into our FileSystem
openFile :: FileSystem -> FilePath -> IO Maybe Handle
-- request a file be mapped at the given FilePath
requestFile :: FileSystem -> String -> FilePath -> FileSystem

In fact, it seems to me that this is identical in all respects! In the absence of a linearity guarantee, for instance, World is just a sort of magic and slightly irritating alternative to (). ;)

Naturally, we do give up static capabilities at this point: we can encode in the type of main that the program may require the use of requestFile, for instance, but not which particular files it might end up requesting. This seems reasonable to me.

I'd be very curious if you agree with this way of describing the situation?

(Incidentally, I think there may be better ways to model this in an FP setting than with a transformed filesystem root, but at least this matches standard OS terminology.)

Finally, I feel much more tentative about these suggestions than I probably appear. I really don't know very much about the capabilities work, and I admire your careful thinking about these questions. You're making some very fine and careful distinctions in your posts that I'm not at all sure I've fully understood. So please understand all my posts merely as food for thought!

### I'm considering your

I'm considering your requestFile function, and I believe I begin to see the point. Let me see if you agree with this: you need something like requestFile to dynamically grant/deny permissions (via the Powerbox).

It's not just about granting/denying permissions to your application as a whole - it's about compartmentalizing permissions so that each (monadic) subroutine can only use those capabilities explicitly passed in by higher level code.

### You can handle dynamic

You can handle dynamic capability requests this way, although at that point I guess you need to account for the possibility of failure (at places where authority might not be granted).

I think I agree with both of your stances on this with respect to dynamic authority requests. Semantically, the possibility of failure should definitely be reflected in the return type. That's not the whole story with capabilities however. I think that most capability programs wouldn't even make such requests, since the majority of programs don't directly interact with users. These programs instead receive their authorities directly from other programs.

It's not right to think of a capability like ReadFileSystem as necessarily granting permission to read from a single canonical filesystem.

A small correction: holding a capability is a necessary and sufficient condition to exert the permission conveyed by the capability. Similar to how holding an open file handle is a necessary and sufficient condition to read and/or write to it, or holding an Int is a necessary and sufficient condition to increment it by 1. If I understand its purpose correctly, I wouldn't call ReadFileSystem a capability so much as a behaviour.

Also, There is some danger in looking up authorities in environments. When auditing a function in a capability-secure language, all the invokable operations are staring you right in the face: they are the function's parameters. The lexical scope contains all the references via which the function may affect the world, so auditing the code is a purely local affair. This is similar to the argument against global mutable variables, except applied to authority and effects.

With environments I suspect this property may no longer hold, as the authority of a program at any given point is no longer dictated by the references passed into the function, but by the control flow of the program up to that point. This is much harder to reason about, and an overly broad authority would be effectively invisible until exploited. Please correct me if I'm missing something which mitigates this danger.

There is one further danger which is the subject of this entire thread: Confused Deputies, where a client can induce a server to misuse one of its authorities on the client's behalf, and the server has no way to defend itself. This is not merely a program bug, it's a property of an access control system which determines whether the server has enough information to defend itself from such an attack. Capabilities are currently the only known solution to the Confused Deputy problem from arbitrary clients. However, as the top post indicates, using rights amplification can introduce confused deputies even in capability languages.

So I'm still skeptical about using environments. They seem too much like existing access-control list systems, which means confused deputies are effectively insoluble.

In an RT language, this function must transform the filesystem (capturing the result of the request for authority), and relevant operations, such as openFile, must be defined with respect to such a filesystem root.

Exactly. Most capability systems go a bit further and simplify. They do not provide a getParent function on a Directory, and openFile is interpreted with respect to a given directory:

openFile :: Directory -> FilePath -> IO Maybe Handle
-- main is given full read/write to a scratch directory
main :: Directory -> IO ()
-- request a file with a reason
-- notice the file isn't mapped, the handle is directly returned; a more capability-like interface
requestFile :: String -> IO Maybe Handle
openChild :: Directory -> String -> Either Directory File

You can see that there's no way to traverse up a directory, so we can isolate subprograms just by creating and passing in a sub-directory to be used as a scratch space. A subprogram can also request a file as before, but this is subject to an external decision which the program cannot influence.

But now, why do we need World? Won't this work just as well? [...] In fact, it seems to me that this is identical in all respects! In the absence of a linearity guarantee, for instance, World is just a sort of magic and slightly irritating alternative to (). ;)

For the restricted case of files only, perhaps (except FileSystem is still available ambiently, but suppose we pass it in to 'main'). Your structure matches almost exactly what I described here, except my post still includes World.

The reason I include World, is because I'm thinking of the more general FFI case, not just the special case of the file system. I want the entire FFI interface to naturally follow the same capability semantics.

Nothing within Haskell can construct a handle to the outside World. The Haskell world is an isolated universe unto itself, and the World passed in is the only link connecting the Haskell world with the outside world, similar to how a channel is the only connection a process may have to another process.

Haskell can construct values of type (), so it's not equivalent. We don't want to pass around World though, we would instead grant indirect access by passing around derivative permissions, such as FileSystem (or SocketOpener, or UsbDevice, or ...). That post details how I see World being used in the FFI if you're interested. No doubt there is a better solution, but it's all I've come up with so far.

Naturally, we do give up static capabilities at this point: we can encode in the type of main that the program may require the use of requestFile, for instance, but not which particular files it might end up requesting. This seems reasonable to me.

Yes, exactly. It would be very hard to statically capture the entire dynamic behaviour of the program in this way, and capabilities seem the most natural, compositional method of doing so. Capabilities are just values after all, and the program is constantly creating and destroying values. Just as we can't precisely define what data will be live at any given point in the program, we can't know the precise structure of the entire access control graph, so we limit the access graph to only the locally held permissions to make it easy to reason about how a program can exert its authority at any given point. I think there is at least a very strong connection to RT here.

I've thought a little about adding capability security to a language through an effects system before. Comonads might be more suitable for merging RT and CS than monads. A function getChar :: Handle -> IO Char would instead be getChar :: IO Handle -> Char. The comonad IO Handle could act as a capability. See Codata and Comonads in Haskell.

I'm not very familiar with comonads though, so I don't know if they are preferable to monads in general. I think there might have been doubts that they actually preserve referential transparency in all cases.

### Stretch and relax

I'm assuming your chworld function is intended to rebind the WorldCap contained within the given IO a. That might solve my issues, but I don't understand how it would work. It seems like you would need some extraordinary runtime support, and the complexity boggles my mind. I'm probably missing something obvious.

Hopefully others have clarified sufficiently - ultimately there's little need for runtime support because it's in some sense equivalent to holding a linearly-used capability inside a State monad and tweaking it appropriately.

but running IO from some conceptual pure land 'around' it suggests the existance of much the kind of ambient runtime you're trying to avoid! Where exactly did the capability to do anything actually meaningful come from?

I feel there's some meaningful insight for me in this sentence that I haven't quite grasped. Could you perhaps clarify?

I've described monads as "mutually-embedded DSLs" in the past - not only is IO embedded in Haskell, but Haskell is embedded in IO (largely via return). We actually want to view the outermost layer as an IO program, because then there's room for all the things that can go wrong in the real world - evaluation of a pure function can fail at any time and that's the world's fault and we've still got semantics for it.

All the authority necessary to do things is invested in the 'interpreter' and doled out as requested. It's not a good idea to ask where the interpreter came from, because a few iterations later we're asking where the universe came from. If we attempt to build the interpreter ourselves then we get in an even bigger muddle!

But when we call a subprogram, now we're on the outside of that program and we're free to build it ourselves. If the type system's left open enough, we can run the subprogram inside a virtual machine and inspect the results afterwards - and barring introspection facilities in IO, the subprogram'd be none the wiser.

### All the authority necessary

All the authority necessary to do things is invested in the 'interpreter' and doled out as requested.

This wording concerns me. What if you don't actually trust the program you want to run? Let's say it's a mobile Haskell program you downloaded from one of your competitors. You don't necessarily want to satisfy all requests. The problem is succinctly summarized by Matt M in clearer Haskell-ese than I'm capable of. :-)

we can run the subprogram inside a virtual machine and inspect the results afterwards - and barring introspection facilities in IO, the subprogram'd be none the wiser.

I believe this is very similar to what I want to achieve. If all authorities are reified as capabilities, the system is fully virtualizable, in which case we can compartmentalize subprograms, and as you say, the subprogram couldn't know. I still don't understand how monads accomplish this though, so I have some learning to do here. :-)

### All the authority necessary

All the authority necessary to do things is invested in the 'interpreter' and doled out as requested.

This wording concerns me. What if you don't actually trust the program you want to run?

Then give the interpreter less authority!

It may help to imagine that rather than working with one monad type IO, you're working with an IOMonad typeclass with many implementations.

### Example

Code in IO can be read much like ordinary imperative code that does stuff, but it is more accurate to think that it is pure code constructing a some data structure describing behavior, which you might eventually choose to perform (you need the latter view to make sense of things like lists of IO actions).

That description is quite far from the actual implementation of IO, but quite close to the implementation of the monad M in the paper A Language-Based Approach to Unifying Events and Threads. M is like IO, providing file IO and concurrency, but it is also completely defined in pure Haskell. Some "application code" with a type like M Int actually evaluates to a tree of "syscalls", with a subtree for each possible result.

Only the worker functions which run steps from the tree actually do anything. You could have them check what operations are allowed if you like. All that said, I think the way to get capability security in Haskell is to restrict access to the functions that construct dangerous actions in the first place, perhaps with some fancy types when you want to ensure a capability has been absolutely excluded from some part of the program.

### Some "application code" with

Some "application code" with a type like M Int actually evaluates to a tree of "syscalls", with a subtree for each possible result.

Clearly if I define my own monads which with capability secure syscalls, then I can ensure my openFile and such are properly chroot'd, etc. Philippa seemed to imply that the openFile in the Haskell standard library could be chroot'd somehow. I don't understand that would work. But as he said, perhaps I need to work out some monads for myself.

### Small correction

But as he said, perhaps I need to work out some monads for myself.

A little off topic, but "he" is actually "she". There are so few women in our community, I think we should be careful to recognize them and show proper respect!

### Chroot

IO is just another library. If you see how you can do what you want if you define your own monad, then you could change IO too.

My point is not that you define "capability-secure syscalls", but that any monad is denotationally just a description of the actions you want done and the actual interpreter can execute that description how it likes. It's just that Peng Li's thread system is explicitly implemented that way so it's easier to see.

### Philippa seemed to imply

Philippa seemed to imply that the openFile in the Haskell standard library could be chroot'd somehow.

Ah, apologies - that'd be the source of the confusion. No, it's not chrootable. More generally, the Haskell libs aren't designed for reflective behaviour.

But as he said, perhaps I need to work out some monads for myself.

I think you've been confused by the sight of my evil twin?

### Forgot something: Hopefully

Forgot something:

Hopefully others have clarified sufficiently - ultimately there's little need for runtime support because it's in some sense equivalent to holding a linearly-used capability inside a State monad and tweaking it appropriately.

What I don't understand is how this affects the behaviour of openFile, such that we can automagically chroot any paths given to openFile. I think if I understand the mechanics of that, I'll understand how the interpreter environment can be restricted, virtualized, etc.

### The short answer is that

The short answer is that openFile is only given semantics by the interpreter - at which point, the question's just "how would you implement a chrootable openFile anywhere?".

It might be that you're best off sitting down and implementing a few custom monads based around state machines and the like as an exercise - parsing and evaluation monads being classic examples.

### A quick footnote

One might consider

chworld :: (WorldCap -> WorldCap) -> IO a -> IO a

to be the function "local" in an instance of the MonadReader class.

### Facets and Membranes

3. Can anyone think of any other possible solutions to this problem, preferably with a minimal burden on the user?

Instead of completely throwing away scope-based authority, you might consider introducing facets and a membrane. In capability talk, a facet F of object X provides a specific part of X's authority. For instance if X refers to a file, F could be a facet providing read-authorty to X. F could simply be implemented as a wrapper to X, fowarding only read requests. To make a real facet, the wrapper should be non-unwrappable.

One facet would provide the full authority of the package scope (the file itself), and another one would provide only the authority available in the public scope (the read-only wrapper). Providing for such scope-based facets may be a relatively simple task for the compiler. The deputy can then use the client-provided facet for the client's purposes, and thus prevent being confused.

However, to mimic actual package scoping, the package would need to be protected by some kind of membrane pattern: no package-scope facet should leak to the public scope in an unwrapped state. Sending a facet from outside the package-scope needs no special attention: facets should never be unwrapped. Efficiently implementing such a semi-permiable membrane may be a challenge.

One more point of attention : as X and F would no longer be equal, you will lose backwards compatibility with normal OCaml. That may be a burden to the user.

### How to make it automatic?

While I understand your approach in theory, I don't quite grasp how this could be an automatic conversion the compiler performs. It seems like the user would have to refactor his code to establish this privilege separation, and while that's a good thing in general, it wouldn't rule out this mistake in the first place. I'm hoping for some static analysis that could prevent a Confused Deputy.

Basically, the signature Bob specifies members O and B each of which define a type t. The confusion arises because of an implicit rights amplification from a Bob.O.t to a Confused.O.t in the Confused.B.compile function.

My elimination of package-scoped functions essentially only eliminates this implicit rights amplification, not scope-based authority in general (at least, I don't think it does). If Confused.O and Confused.B weren't supposed to be sealed by Bob.O and Bob.B respectively, then all functions would be visible to everyone, and no confusion is possible. The problem is that the sealing is applied after visibility inside the Confused module is already resolved. By reversing the process, the confusion is eliminated. This is equivalent to eliminating package-scoped functions for modules that are sealed by a signature (I may have forgotten that last clause in my original formulation).

I still haven't heard a good use-case for such package-scoped functions, so I still think trading package-scoped functions for binary functions is a good deal. I welcome any further suggestions though.