The Structure of Authority: Why security is not a separable concern

The Structure of Authority: Why security is not a separable concern, by Mark S. Miller, Bill Tulloh, and Jonathan Shapiro:

Common programming practice grants excess authority for the sake of functionality; programming principles require least authority for the sake of security. If we practice our principles, we could have both security and functionality. Treating security as a separate concern has not succeeded in bridging the gap between principle and practice, because it operates without knowledge of what constitutes least authority. Only when requests are made -- whether by humans acting through a user interface, or by one object invoking another -- can we determine how much authority is adequate. Without this knowledge, we must provide programs with enough authority to do anything they might be requested to do.

We examine the practice of least authority at four major layers of abstraction -- from humans in an organization down to individual objects within a programming language. We explain the special role of object-capability languages -- such as E or the proposed Oz-E -- in supporting practical least authority.

An important overview of why security properties cannot be an after-thought for any platform, languages and operating systems included. To this end, the paper covers security properties at various granularities from desktop down to object-level granularity, and how object-capabilities provide security properties that are compositional, and permit safely composing mutually suspicious programs.

A recent LtU discussion on achieving security by built-in object-capabilities vs. building security frameworks as libraries reminded me of this paper. Ultimately, the library approach can work assuming side-effects are properly controlled via some mechanism, ie. effect types or monads, but any solution should conform to object capability principles to maintain safe composition.

An example of a capability-secure legacy/library approach is Plash (Principle of Least Authority SHell), which provides object-specific file system name spaces. Any library interface to the file system should mimic this file system virtualization, which effectively pushes side-effect control down to OS-level objects, and which is essential to safely composing mutually suspicious programs that access the file system.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

There are no categories for

There are no categories for security or anything like it really, so I hope my choices are appropriate.

They make sense to me. I

They make sense to me. I tried to make categories rather coarse grained, but if there is interest we can add "security". Perhaps "safety" which is more general (though that will be confused with type safety, for sure)?

Security, safety, and trust

Security, safety, and trust are all very different topics.

Indeed

I think when used in context of programming languages:

  • safety relates to issues like thread safety, memory safety (programmer can not injure himself) or to system purpose safety (fail-safe industrial, medical solutions);
  • trust means if one component can rely on another component (local or remote) matching assumptions or expectations;
  • security means data and processing security as in CIA triad.

More generally: safety

More generally:

  • safety relates to the integrity of your environment. If the environment assumptions/guarantees can be violated, its unsafe and will not behave as expected. An environment with no assumptions/guarantees is technically safe, but not that useful.
  • trust relates to assumptions/guarantees made by external entities and components. Trust is mostly social: there is no logical reason to trust someone else or their component, its only that someone said they are trustworthy or you have some recourse if they are evil (e.g., a security deposit).
  • security is the ability to restrict the propagation of information. i.e., I want some guarantee that my credit card number will not be made available to scammers.

Flow of information is only

Flow of information is only a part of security research, and access control models tend to be more about integrity. "Secure" is thus a safety property, but it's not generally discussed as such.

I can't think of a description that's coarse-grained enough to distinguish it from other safety categories though, or that would be meaningful to someone who isn't already knee-deep in this stuff.

Access controls limit the

Access controls limit the flow of information, I was being general rather than talking about specific techniques (i.e., not information flow analysis but rather just the flow of information).

Safety generally deals with integrity, while security is more a matter of policy. Even if you have safety, it doesn't matter if the credit card numbers are available to the public through a neglect of policy! Without safety (meaning the system is not 100% reliable), then a hacker has a chance at getting at otherwise private credit card numbers.

It is extremely useful to tackle each aspect separately, especially the issues of security and trust. Reliability of information should be handled differently from how public the information needs to be.

Access controls limit the

Access controls limit the flow of information, I was being general rather than talking about specific techniques (i.e., not information flow analysis but rather just the flow of information).

I know what you meant, I'm just saying that no access control system to date can practically limit information flow between conspirators, and I suspect that trend will continue far into the future. What access controls can do is prevent mutually suspicious agents from interfering with each other, which is an integrity issue. So practically speaking, security is more about integrity than controlling the flow of information.

As for safety, security is a safety property by the strictest PLT definition of safety, but I understand the motivation to define a term as narrowly as possible to aid precision in discourse.

There are other forms of

There are other forms of security, such as availability / quality of service are often security-critical. An example is that it is important that emergency calls can be made (under reasonable conditions) with phones.

Had a surprising discussion with a colleague recently about the difference between 'security' and 'privacy.' I always assumed security was a general encompassing term while privacy was synonymous with confidentiality, but she actually disagreed with the latter.

+1 for a security category :)

There are other forms of

There are other forms of security, such as availability / quality of service are often security-critical. An example is that it is important that emergency calls can be made (under reasonable conditions) with phones.

Hmm, I'm not sure I see how this could be classed as a security property. Can you elaborate?

Liveness is a Security Property

If you do not consider availability/QOS to be security properties, then it logically follows that you do not consider denial-of-service attacks to represent system security violations. If you do not consider denial-of-service attacks to represent system violations, then I can offer you a very cheap computer-security solution involving a sledgehammer and some controlled arson.

Any reasonable definition of computer security must include a provision for liveness (accessibility, availability, QOS, disruption tolerance, graceful degradation) in addition to the various safety requirements. Basically, each participant must be able to act within its legitimate authority in addition to being prohibited from acting outside its legitimate authority. You cannot achieve this to the extent that participants can, without a legitimate authority, prevent other participants from utilizing their authorities.

Capability models offer a definition to clarify what 'legitimate' authority means within a computation system - i.e. authorities endowed, granted, communicated in the form of 'capabilities'. But they don't protect against illegal use of physical authority, such as cutting your phone lines, jamming your airwaves, stealing your laptop, or dropping bombs on computer centers. In the broad sense, computer systems must control the extent to which they are compromised even even in the face of such physical threats. Software, and protocol, and even language design, will each be a big part of that picture (as are physical answers to physical threats, such as self-destructing memory).

The telephone system is a

The telephone system is a shared channel with an intrinsic maximum bandwidth. Each call has a bandwidth slice allocated according to some QoS requirements. A DoS due to high call volume doesn't strike me as a violation of any sort since it's a physical property of the channel. Preventing agents from exceeding allotted bandwidth is perhaps a security property of interest.

Physical Security

Ah. So if I broke through your window and stole your laptop and kitchen sink, there is no sort of security violation because I utilized a physical property of your window? Any computer security regime overlaps with physical security. We generally assume possession and protection of computer hardware by 'owners'. (The DRM crowd doesn't assume even that much.)

About the phone-line example in particular, you must ask: does your legitimate authority to the bandwidth start when you contract the phone service, or does it start when you successfully make a call?

Telephony switching networks weren't designed that way

It is a bad comparison

Wireless Phones, Foreign Phones, Red Phones, Green Phones

Which telephony systems in particular aren't designed which way? (Are you projecting some sort of (possibly dated) American cultural assumption onto this discussion? VoIP and Wireless have been causing headaches about the security and survivability of our telephony infrastructure for years now.)

Ah. So if I broke through

Ah. So if I broke through your window and stole your laptop and kitchen sink, there is no sort of security violation because I utilized a physical property of your window?

There was no illusion that a window could prevent illegal entry. The "security violation", if it can be called that, is social.

does your legitimate authority to the bandwidth start when you contract the phone service, or does it start when you successfully make a call?

Good question. The physical packets should be stopped at the receiving tower/closest switching station if upstream bandwidth isn't available. Cell phones can be in contact with multiple towers at once, so if the aggregate bandwidth is insufficient to sustain a call then it must be dropped. Convention phones are wired directly to switching stations so the situation is simpler.

Tresspassing and Theft

Ah. So if I broke through your window and stole your laptop and kitchen sink, there is no sort of security violation because I utilized a physical property of your window?

There was no illusion that a window could prevent illegal entry. The "security violation", if it can be called that, is social.

Get your facts right.

[ Looked it up. In some countries/states it would also be classified as burglary, it depends on the legalese. ] [ Ok, a technical discussion, but still. Your definition of security is off. ]

Security is about Risk Management, not Expectations Management

There was no illusion that a window could prevent illegal entry. The "security violation", if it can be called that, is social.

What you say above implies a position: a "security violation" of a system occurs when one's illusions about system properties are violated. You reference illusions regarding properties of both the physical and social systems. We could s/illusion/expectation for a more positive spin. This matches more than one definition of security[1].

The problem I have with this position: it implies that the easiest way to achieve 'security' is to manage our expectations. If our expectations are kept low enough, we'd have the 'security' you describe even if we're regularly losing lives and property to known predators and threats. (You're hungry and that bully took your lunch money again, but hey! your money and lunch service is 'secure' because that's what you expected!) Additionally, 'expectations' also don't work so well as an operational definition or metric: in practice, 'expectation' means something like 'amorphous, vague, moving goalpost'.

But, even under the 'expectations' concept for security, it would be reasonable to assert that, to the extent that 'availability' is among your expectations, violation of availability is a "security violation". If you expect your cellphone to work while it is juiced up and you are in a given zone - i.e. to let you call a taxi or an ambulance - then violation of this expectation is a violation of system security.

My opinion is that computer security encompasses a whole spectrum of risk management for a system of computerized command, control, communications, and information services. Information services are often real-time (surveillance, reconnaissance, situational awareness, sensors and data-fusion). Threats to these services includes denial-of-service, theft, natural disaster, insiders, conspiracies, violent directed attacks, slashdot effect, etc. ('Survivability' better connotes the wholistic answer to these threats.) 'Risk management' does not mean 'zero-tolerance for risk'; rather, it is about understanding and controlling both vulnerability and the level and form of any compromise. Very roughly, security_risk = Σ(threat * vulnerability * cost_of_compromise). One must associate real 'compromise costs' with denial or degradation of service, just as you would for leaks of information and authority assets or damage to their integrity.

(To clarify, security is not the only class of risk management for "computerized command, control, communications, and information services". For example, there also is much risk associated with marketability, liability, demand, cost of development, ambiguous and changing requirements, competition, etc.)

Security is not a separable concern for language design because it must be possible to locally reason about and control vulnerability and level of compromise from known and anticipated classes of threat[2], especially for extensible or pluggable or collaborative services (which are very common in the domain of C4ISR).

Capability languages attack vulnerability and compromise for several classes of security threats by controlling who obtains an authority and how much authority they obtain. They are capable of information-flow management via confinement, though significant reliance upon confinement both greatly hinders system expressiveness and ignores realities about the software development life-cycle (see the 'soapbox' section below, and extended rant elsewhere.)

Various information-flow and rights-escalation systems can interact with capability systems to control how far locally hosted (but untrusted) code is able to distribute 'sensitive' information or authorities even to the capabilities it possesses. These techniques may increase system expressivity, relative to confinement, while still allowing management of security risk. (Sealers/unsealers and membranes, for example, can allow use of unconfined external services for generic programming.)

And a good language for secure distributed programming must allow reasoning about and controlling security risk under events such as disruption, node failure, network partitioning, and (for open systems) even the slashdot effect. Again, one is not talking about achieving 100% protection, but rather about understanding and managing risks. A language and its distribution protocol can potentially support a developer in mitigating compromise or vulnerability by achieving certain forms of scalability, disruption tolerance, graceful degradation, and resilience. Bonus points if one can also reason about such threats as node-theft and compromise, violent destruction of nodes or subgraphs, anonymity of communications, and overlay or routing conspiracy.

<soapbox>I posit that the vast majority of modern programming is open and distributed in nature (already, today). Applications and libraries and services and Operating Systems and plugins and drivers and language implementations and even bodies of 'data' are developed and distributed and maintained by third parties whose trust should be suspect. Any in-the-large computation model that assumes a closed or deterministic world will ultimately unmake itself with various forms of regression and greenspunning. Once this happens, the paradigm is no longer aiding with local reasoning about in-the-large system properties, such as security, correctness, or time-space performance.</soapbox>

Anyhow, if you review common explanations of computer security, information security, information assurance, etc. you'll see 'availability' or 'accessibility' repeated often and prominently as one of several key properties. It isn't as though I'm making this up just to feed my hobby horse.

[1]: "A computer is secure if you can depend on it and its software to behave as you expect." Simson Garfinkel and Gene Spafford in Practical UNIX & Internet Security

[2]: Note that I didn't say 'controlling or managing threat' as a role of the language design. While it is possible to control certain forms of threat indirectly - via reprisal, responsibility and traceability, liability, social and physical deterrence - that should really ought to be the last line of defense for computerized systems, rather than the first! I believe these defenses should be provided in the form of protocols, smart contracts, and resource/service markets. Horton is one example of a protocol intended for this sort of purpose.

I wasn't referring to DoS to

I wasn't referring to DoS to the network but against the user on the same device. A common validation for separating the telephony core from an application processing core is to ensure that a screw up in the application stack doesn't interfere with the ability to make a call. It isn't for the integrity nor confidentiality of what's going on in the phone conversation but, again, to ensure that a call can be made.

Bringing capabilities into the picture is confusing. E.g., we can reify machine model resources as capabilities such as a CPU time slice to be delegated. It might make sense, but I believe that's in the research realm outside of how most security folks think about it. Assumptions about a capability system are fairly strong: in my example, we don't trust the entire application stack, which would include the capability/OS kernel. Emergency calls are often a life-or-death situation and thus heavily regulated.

**This is based on what I've heard from phone people -- I can be wrong in the particular example. However, it's an example of a general concern, which is the real point.

Bringing capabilities into

Bringing capabilities into the picture is confusing. E.g., we can reify machine model resources as capabilities such as a CPU time slice to be delegated.

Indeed, this is similar how EROS did it with it's capability to the real-time scheduler. The scheduler is an object with which a process must be registered in order to execute instructions. The capability used to register the process holds its scheduling information.

Allocating the telephony process a real-time schedule is sufficient to meet your requirements, and if the application stack manages to violate that schedule it's violating a system integrity property. Am I being too loose with the definition of integrity?

Indeed, this is similar how

Indeed, this is similar how EROS did it

Based on systems I see commonly used, I view cooperative scheduling of physical resources (and, worse, in a secure context) to be an unclear model for programming in-the-large. This is one of the reasons kernel hacking is hard. What about other critical resources, like network, disk, or deep cache access, that might be beyond CPU quanta? I see the ability to use capabilities for physical resources (and, more generally, cooperative scheduling), but EROS seems to be a counterexample for it being a complete idea in terms of uptake -- I'm still putting this in the research bucket for building secure systems. The embedded, HPC, and server folks do it sometimes, but for performance; I don't know how they fare in reliability due to it.

I am interested in this from both the performance and security standpoints (see this position paper), but don't think it's a proven nor likely complete solution at this point.

Allocating the telephony process a real-time schedule is sufficient to meet your requirements.

We don't trust the kernel, which, in most capability systems, would be the fancy language runtime. Security comes in layers. The solution here is often physical separation to avoid a verified stack: verification is hard, especially when we want something like Linux or an optimized JIT for the application stack.

To make this real, attacks against the JavaScript and ActionScript VMs occur every year despite being for safe languages. Hence, process isolation. Now, what if a hijacked process fork bombs the system?

There are fixes, but the point is we need to trust the OS. At least on my everyday machines, I find scheduling to be poor in times of contention.

Am I being too loose with the definition of integrity?

You are still making a liveness property for integrity by tucking it into the scheduler which now needs the liveness guarantee. In a sense, integrity is the atemporal property here. Information flow people want it in safety (never leak), here, I want it for liveness (the schedule happens). Kind of a neat trick as I've always thought of integrity as a canonical safety property :)

Based on systems I see

Based on systems I see commonly used, I view cooperative scheduling of physical resources (and, worse, in a secure context) to be an unclear model for programming in-the-large.

I'm not sure where you're going with this. EROS doesn't use cooperative scheduling.

I see the ability to use capabilities for physical resources (and, more generally, cooperative scheduling), but EROS seems to be a counterexample for it being a complete idea in terms of uptake -- I'm still putting this in the research bucket for building secure systems.

KeyKOS, EROS's commercial predecessor ran high availability time sharing systems for years. I think capabilities have been proven in this context. Cell phones provide unique challenges, but I don't see why the same security mechanisms KeyKOS exploited wouldn't extend to this domain.

We don't trust the kernel, which, in most capability systems, would be the fancy language runtime.

If you can't trust the kernel, what do you trust then? And how does this jive with your later statement that we need to trust the OS?

Now, what if a hijacked process fork bombs the system?

Fork bombs are only a danger if the system does not have proper resource accounting. With proper space and time accounting, which EROS has, these are not a danger to the system.

At least on my everyday machines, I find scheduling to be poor in times of contention.

Everyday machines do not run operating systems that properly account for all resources, including space and time. EROS's and KeyKOS's memory management is a beautiful design IMO, and it's been proven in the field.

You are still making a liveness property for integrity by tucking it into the scheduler which now needs the liveness guarantee.

Well, you need to audit the core security mechanisms of any trusted core to ensure the system's invariants are respected, assuming you can't take the full step to verify them. If you're violating a system's invariants, and the real-time scheduler is supposed to provide temporal invariants, then you've violated the system's integrity.

EROS doesn't use cooperative

EROS doesn't use cooperative scheduling.

I mean cooperative in the sense that something must be passed from one executable unit to another to run. In this case, the ability to run (and how much).

KeyKOS, EROS's commercial predecessor ran high availability time sharing systems for years.

Do you know of any experience reports? I only saw what looked like research papers. I do not mean to criticize the design effort; evaluation is tricky and orthogonal from both use and skill.

If you can't trust the kernel, what do you trust then? And how does this jive with your later statement that we need to trust the OS?

I think you may have misunderstood the rest of my comment about trust by reading it line by line and not as a structured argument and would suggest rereading it.

1) VMs and OSs are tricky things as the TCB is huge, complex, and generally they're written in ways that work against typical verification techniques. What is an acceptable TCB for one task may be unacceptable for another. For the next long while of a smart phone, even if we have a capability OS / VM, I'd expect the OS to be fancy and the VM to have many JIT optimizations, which would all be unverified and thus an unnecessary risk for making emergency calls.

then you've violated the system's integrity

2) I agree with you. My statement was that it is still a liveness property. Liveness and safety are generic temporal properties independent of the atemporal predicate (sending a call request). In this case, I claimed we have a liveness property. As you noted, the liveness part is now tucked into the system (though not entirely -- we still have to verify that the resource containers are passed around correctly). Namely, the CPU resources (and all others) get to the call.

BTW, I suggest we don't use the term 'system integrity' as this might be confused with the more meaningful use of integrity in information flow.

Do you know of any

Do you know of any experience reports? I only saw what looked like research papers. I do not mean to criticize the design effort; evaluation is tricky and orthogonal from both use and skill.

KeyKOS was put into production in 1983, and all the KeyKOS papers were written years after. In a sense, they are all experience reports. Any design documents or application references describe systems that were already running for years.

VMs and OSs are tricky things as the TCB is huge, complex, and generally they're written in ways that work against typical verification techniques. What is an acceptable TCB for one task may be unacceptable for another.

Absolutely, which is why microkernels are pushed so heavily. EROS's TCB is on the same order as the recently verified L4sec, and you can't get much smaller.

evaluation not design document

Almost anything can (and does) get used in production; hopefully you don't find that a compelling reason to advocate something!

Describing a system that is being used is different 1) from an experience report and 2) from a robust evaluation. Can you point to any in particular when making this claim? I didn't see any based on a skim of the titles.

Almost anything can (and

Almost anything can (and does) get used in production; hopefully you don't find that a compelling reason to advocate something!

I do when the design is elegant, and it survived 10 years of mission critical use.

Describing a system that is being used is different 1) from an experience report and 2) from a robust evaluation.

I'm not sure exactly what you're after, but perhaps the document on KeySAFE meets your "robust evaluation" criterion, since it's a KeyKOS subsystem designed to meet high B-level requirements of the Department of Defense Trusted Computer System Evaluation Criteria.

The KeyTXF document also provides comparisons to existing transaction processing systems of the time, re:performance and development effort.

KeyKOS also developed a UNIX compatibility layer called KeyNIX, and they describe various experiences of transitioning programs from UNIX interfaces to the more decomposed object model encouraged by KeyKOS.

Most of the documents are littered with small experience reports, but I'm not aware of a single document that coalesces all of these descriptions.

and it survived 10 years of

and it survived 10 years of mission critical use.

... as has Cobol. I'm not sure what it proves.

since it's a KeyKOS subsystem designed to meet high B-level requirements

Is it certified? I don't know how robustness correlates with such a certification nor about the causality involved. Also, a small audit committee is good but it's hard to learn lessons unless that audit totally failed the system.

I'm not sure exactly what you're after

I looked at the KeyTXF doc and it was nearly impossible for me to make any conclusions based on the evaluation other than "trust me."

KeyNIX was more realistic, but seemed to be about performance, which isn't super surprising when you remove protection layers as I believe it is doing (though obviously putting in different one). The evaluation even stated "A more careful analysis would be required for any serious evaluation of the two systems for production use." More importantly, I don't know how its security has been tested nor how malicious was the environment of its deployment.

I'm wondering if 1) it's appropriate from general programmers to be using resource containers, especially of fine granularity and of disparate types (as in TesselationOS) and 2) if I can trust a capability OS significantly more than any other OS. I don't know anything about the programming process because the deployment scenario doesn't seem very taxing (transaction processing) and, for the second, all you've pointed me to was design documents for a certification and that it didn't crash too often (maybe?) for the transaction processing it did. Throughput-oriented (internal?) transaction processing is a good test if you're selling unnetworked internal servers but inappropriate if you're making the next phone OS.

Again, this is not to critique KeyOS. I'm questioning the notion that the two ideas above have been proven as reliable/robust, which, in a sense, is often determined after its uptake, a path dependent upon outside factors like social ones and thus not very often achieved. It's hard to do a convincing evaluation (and often not worth the effort if you're not interested in the scientific method: deployment begins word-of-mouth which is more useful for spread). KeyOS was an amazing effort and I'm a proponent of capabilities. Overclaiming can set a field back and hide useful questions.

... as has Cobol. I'm not

... as has Cobol. I'm not sure what it proves.

Notice you left off the first, and arguably the most important, criteria... ;-)

if I can trust a capability OS significantly more than any other OS.

Smaller TCB generally implies greater trust. Furthermore, you have stronger security properties for that smaller TCB due to the principled design. I still don't have a sense of what sort of argument you can make that that isn't addressed by these two properties combined with an existence proof of a sufficiently performant, stable implementation that was in use for a decade in a variety of scenarios from timesharing on mainframes to ATMs. That's not even accounting for the refinements that happened in the EROS project. This is more evidence than existing embedded operating systems had when they were first deployed, including the iPhone OS and Android.

Exactly what properties of interest does the existing evidence not provide?

More importantly, I don't know how its security has been tested nor how malicious was the environment of its deployment.

Secure time sharing of mutually suspicious programs was the motto. And I'm sure it goes without saying that security cannot be tested for.

What's needed is a principled design and an existence proof consisting of a sufficiently efficient implementation of said design. I think capability operating systems have already met both criteria, and all they lack is uptake. I'm led to wonder sort of audience you are trying to convince with this talk of evaluation and/or experience reports? I think the evidence is sufficiently convincing for any developers working with security, and anyone concerned enough with security to read enough existing literature.

The only people unconvinced are those who are still suspicious of delegation. Personal accounts on cap-talk implies this primarily consists of government officials.

For starters, I'd like to

For starters, I'd like to know how it fared in a malicious environment, such as hackers trying to break into the ATM. Was it even the OS that protected against the hacking attempts?

Security can be both tested and verified. I don't know about verifying usability, but testing can be done as well. Clearly it has somehow been done so here, but this has not been communicated through what you have shown so it's unclear what the lessons to be learned are.

I don't see how asking for an evaluation of security in practice is objectionable when it is suggested a project has demonstrated its approach to security is ready for use in practice. This is important for both architects determining whether the approach is appropriate and for researchers trying to improve or use parts of the design. If this isn't standard in your community, I'd be curious as to why.

Security can be both tested

Security can be both tested and verified.

Like bugs, testing can only prove the presence of security holes, not the absence of security holes. Passing tests doesn't prove anything about your security.

I don't see how asking for an evaluation of security in practice is objectionable when it is suggested a project has demonstrated its approach to security is ready for use in practice. This is important for both architects determining whether the approach is appropriate and for researchers trying to improve or use parts of the design.

It's not objectionable, but tell me what you could even evaluate it against? What other widely deployed operating system allows you to express and enforce POLA on fine-grained objects to the extent that you can on the capability OSes we've been discussing? Most operating systems allegedly providing fine-grained security models, like SELinux, Bitfrost, HP's Polaris and even Plash, cannot protect against DoS attacks against the kernel itself. Any other policy enforceable on these systems is enforceable on EROS given an emulation layer like KeyNIX.

With the possible exception of L4sec, which I haven't reviewed, there is simply no operating system I am aware of that provides the level of security as that achieved by EROS, Coyotos and KeyKOS. This is why the whole discussion surrounding experience reports and such just strikes me as red herring: the security achievable on these platforms is just beyond anything that's out there now or what's on the horizon.

I agree that there is a great deal of work to be done with devising user interface abstractions for expressing authorization decisions, but it's all stuff that must be built on top of the core OS.

Like bugs, testing can only

Like bugs, testing can only prove the presence of security holes, not the absence of security holes. Passing tests doesn't prove anything about your security

I agree. Subjecting a system to a hostile environment or a team of hackers, as do other testing approaches, does raise confidence. Both testing and verification have their roles; verification currently doesn't provide full confidence either (as has been repeatedly discussed here). To be clear, by testing, I don't mean unit tests but approaches that result in examination by white/blackhats.

It's not objectionable, but tell me what you could even evaluate it against?

I'm not suggesting a comparative evaluation to these other systems but of whether it achieves its security goals and the experience of programmers in writing secure applications using it.

I *can't* make many claims about the security or usability of these systems beyond the paper design level because that's almost the only information that is available.

since it's a KeyKOS

since it's a KeyKOS subsystem designed to meet high B-level requirements

Is it certified?

It was not. NSA strongly encouraged Key Logic to take KeyKOS through an A1 certification, and believed that it would pass. Key Logic, Inc. decided not to do it, because they could see no customer to justify the expense. KeyKOS did go through an informal evaluation by NSA along with several other systems; these were done as a sanity check on what would become the (not yet standardized then) Orange Book. The assessment at the time was that KeyKOS would have received a B2 certification quite readily.

It's also interesting to note that Sue Rajunas was one of the key standards authors, and that her response to seeing the KeyKOS work was to join Key Logic to build KeySAFE. At the time the prevailing wisdom was that capability-based systems couldn't do mandatory controls. Virgil Gligor was a prominent voice taking that position. Susan's work on KeySAFE is a proof by counterexample.

resource accounting and SoA

I mean cooperative in the sense that something must be passed from one executable unit to another to run. In this case, the ability to run (and how much).

The KeyKOS and EROS resource accounting models make sense enough relative to their application models.

But I agree that, as systems grow to depend upon complicated interactions between services in a system (as with SoA), you'll begin to see a number of interesting problems that the region-based and time-quanta based resource accounting solutions are not well equipped to solve. Many of those problems relate to concurrency control (livelock, transaction restarts, priority inversions) and predictable failure modes (ensuring enough resources are maintained to properly recover or clean-up after failure), while supporting some sort of pay-for-service.

I suggest we don't use the term 'system integrity' as this might be confused with the more meaningful use of integrity in information flow.

I agree. There is a reason for the 'A' in the CIA triad.

Increasing interaction

...as systems grow to depend upon complicated interactions between services in a system (as with SoA), you'll begin to see a number of interesting problems that the region-based and time-quanta based resource accounting solutions are not well equipped to solve.

Definitely. We're reaching the point where improvements in one dimension will require compromises in another. For example: any sort of truly usable real-time GC will have to get cooperation from the scheduler. These sorts of interactions necessarily make the separations of concerns weaker.

Yes, please don't say "safety"

Hi Ehud, yes, on LtU especially, I'm afraid "safety" will be misunderstood. I think a "security" category would be great. There's a lot of work on language-based security, and LtU would be a good place to discuss more of these.

Hi Sandro, thanks for calling attention to this paper!

Would anyone actually like to discuss the paper?

Please?

Actually, I have a question.

Actually, I have a question. This post was motived by the thread I linked to which was debating monadic encodings of security policies as opposed to ocaps, and how the encodings might interfere with compositional reasoning. I think this bears directly on section 1.1 with the classic file copy example:

# traditional ambient copy command
$ cp foo.txt bar.txt

# capability copy command
$ cat < foo.txt > bar.txt

Translating this to a programming language, the first uses embedded strings and issues openFile commands before copying, and the second directly takes file handles/file streams. We can tame the original ambient copy command if we add private namespaces, the way Plash and Polaris operate, to restore the security.

Do you have any data, anecdotal or otherwise, that suggests whether the the tamed command using embedded strings is equivalent to the capability form in terms of reasoning about the behaviour of a program? In particular, IMO the most important case is the ease of composing two mutually suspicious programs written in each style and which approach is easier to specify and ensure that invariants are maintained.

Modulo an information leak, they are equivalent

First, some clarifications: I'm using cat vs cp to illustrate a more general issue. This is not about fixing the Unix command line -- though I'm ecstatic that Plash does fix the Unix command line in these ways. For the purposes of this example, we can ignore the difference between passing an already open file descriptor vs a right to open a file in that same mode.

With those differences gone, in the cat example, the caller is also passing two rights to the callee in a private namespace of the caller's construction, which the callee is expected to index into:

  • 0 -> read-rights on file foo.txt,
  • 1 -> write-rights on bar.txt.
In the Plash version of the cp example, IIUC (I haven't studied Plash in depth), the private namespace is
  • "foo.txt" -> read-rights on file foo.txt,
  • "bar.txt" -> write-rights on bar.txt.

Within the callee, as far as the OS is concerned, the rights to dereference either private namespace is ambient. This is not quite the same as the difference between positional vs name-based argument passing, since in the Plash case, the callee still binds to its conceptual parameters based on the order of the argument names, not based on what these names are. So a correct account of the Plash/cp example is that there's an additional private namespace of name argument positions:

  • 0 -> "foo.txt"
  • 1 -> "bar.txt"

AFAICT, the only security difference between cat and Plash/cp is an arguably small information leak: The Plash/cp form reveals to the callee what the caller's names for these files are.

I am also assuming -- but I don't actually know -- that in both cases the caller has no access or ability to manipulate the namespace passed to the callee once the control transfer has occurred.

I'm wrong -- there is a significant difference

In the Plash/cp example, if the same file name is passed in two positions, with different rights attached to each, there will be a mushing together of the rights associated with the one name. In the cat example, separate rights to the same file are kept cleanly separate, because the samespaces keep them separate. I suspect this bleeding of rights between Plash arguments can lead to subtle forms of confused deputy problems.

Eh...

[Sidenote: For some context, this paper was written in 2002. Point being that Mark IS a much better writer now, and Acegi didn't even exist in 2002 so my example is an anachronism, but still!]

I didn't like the paper, even though I agreed with everything written.

If you knew me, you'd know I start everything off with an example, so that is what I will do here.

It seems to me your beginning example -- solitaire -- is poorly motivated and poorly illustrated. Your major point is that we're giving solitaire authority to do anything it pleases on our machine. You don't actually address what solitaire needs to do, or provide a reference to a discussion where you work through how to implement solitaire in an ocaps model. The only feature you consider for solitaire is the ability to save scores to disk.

From the paper, page 4, at the end of section 1.0:

In this paper we explain how narrow least authority can be practically achieved.

Then section 1.1 changes the example to a command-line program, as if the discussion on solitaire was somehow concluded. There isn't even a URL to some source code one can browse to see what solitaire looks like in an ocaps model.

The problem with changing the example is your second example, cat vs. cp, is still sufficiently lacking in detail to fully cover a real world system. Let me use a well-known example from Eric S. Raymond to demonstrate my point; from The Art of Unix Programming, Ch.2., Problems in the Design of Unix; 1. A Unix File Is Just a Big Bag of Bytes:

On the other hand, supporting file attributes raises awkward questions about which file operations should preserve them. It's clear that a copy of a named file to another name should copy the source file's attributes as well as its data — but suppose we cat(1) the file, redirecting the output of cat(1) to a new name?

There are other questions here you are not addressing, too, in addition to the classical system design question Raymond raises. For example, how does cat interact with file system quotas? How about the fact that on UNIX systems, things aren't really deleted as long as someone is still "using" them? So someone could have written a command to append to the end of the log file, and someone else could have unlinked the file, and the user is effectively writing to /dev/null. So your analysis of UNIX file interaction is fundamentally incomplete and lacking a thorough problem domain analysis. Are ocaps up to the task? Of course, but HOW and WHY is not even addressed in this paper.

What about how FTP software was traditionally implemented on UNIX systems and the evolution of the tremendous hack that is chroot jails, which really only exists due to tying directory views to OS program functionality and, truly, a fundamental misunderstanding of the logic of designation? What about tracing the history of FTP software on UNIX and how each design in each different software really didn't actually solve any problems and had no means of formally verifying security??? Why isn't the mere fact that people argue a new system is more secure without any sound basis for the argument interesting?

I could go into more detail, but overall, I just have to ask: Who is your target audience? From the tutorial nature, it does not appear to be experts. Then ask yourself, who on earth is this paper intended for? You start off with solitaire, briefly touching on the trouble with sandboxes, but then do not go into the design flaws of code access security and stack walking in these sandboxed systems. All of your examples are surface-level and do not contain a complete, robust problem domain analysis.

I would love to see a paper where you discuss the JVM and the CLR, and also discuss how the security exhibited there actually inhibits optimizations and the implementation of more radical programming languages with more powerful features. You could then proceed to discuss J2EE containers and ambient authority, and how libraries like Acegi Security (now known as Spring Security) which attempt to address security through aspect-weaving at dependency injection time based on access control lists and roles is fundamentally different from an ocaps model. This would then taildove nicely into the papers actual argument: why security is not a separable concern, since Acegi is a model that treats security as a separable concern. Acegi is also based on the belief that code can be assigned to users in a scalable fashion while permitting accurate auditing; every article I've ever read on Acegi is full of s--- and I've never seen any deconstructive critique about why it is so full of s---.

<HotButton>
Overall, I disliked the paper and think you write too many papers like this, where I can't tell who your audience is. I want papers that I could show to people who read TheServerSide.com and TheServerSide.net. I don't want papers with a wide scattering of surface level examples. I want end-to-end arguments in system design. Capability Myths Demolished was some of your finer work, along with your Ph.D. dissertation.
</HotButton>

Let's have you take the Z-Bo Redneck Challenge. Type into Google "acegi security sucks". The first link is a hilariously troubled forum post titled People who embrace Acegi don't understand JAAS. Well, there is a person I'll never hire to do security for me!. JAAS is Code Access Security, and in fact depends on PAM, which was itself a complete design failure. Also pay attention to the practical rebuttal in the thread, and what people really want: "I should be able to run my application and provide security in any damn environment. And that's where Acegi fills up the lacuna." Then look further down, where someone defends JAAS, not on the basis of, you know, actually FRIGGIN SECURITY, but saying you can solve vendor implementation anomalies by creating a portability layer.

NOW you have yourself an audience you can target. NOW you have a paper to write for a real audience filled with amateurs. NOW you are not writing papers for experts who already pretty much know all this.

What about HIPAA? Role-based access control lists based on the principle of least privilege are hardwired into the legislature. Perhaps you could write a more technosociological paper and target a different audience, yet again.

Sorry if I am an obnoxious petard; I know I am a pushy person and impossible to deal with when I feel like pushing my own agenda. However, this is really a counter to the recent discussion on the cap-talk mailing list, which features silly viewpoints such as the idea that "object-oriented security" renaming of ocaps would somehow improve things. Marketing is about narrowcasting. Politicians do this; they tell one group one story from one viewpoint, and then they tell the same story from another viewpoint to senior citizens. In that way, everyone gets on board. Go back and research how Gregor Kiczales popularized AOP; I've been doing this lately based on recommendations from Ralph Johnson, and it is really amazing how much Kiczales, and today's language designers like Rich Hickey, have figured out about marketing. Rich effectively has his own TV network on the Internet!

Postscript: The idea that people should sign-up for the cap-talk mailing list is also a bad idea. People get enough mail already, and what they want is stuff like blog posts they can read while they are waiting for their build to complete and the light to turn green in their TDD suite. The point being that you go out to people, and don't expect them to come to you. After the Sermon on the Mount, Jesus moved down the mountain and healed a Leper.

ok

the whole example leading up to the leper thing really cracked me up in a good way, thanks. :)

NOW you are not writing

NOW you are not writing papers for experts who already pretty much know all this.

While you raise some good points, I'm not so optimistic about this one. I really think there are plenty of experts who don't know all this. Further, this paper seems targeted at people between experts and amateurs, people who build languages like Python. That's potentially a good class of people to convince, given the downstream effects they could have, assuming these people were interested in security in the first place. The hook to get them interested is what's missing.

You're right

But how do you target the experts? The mistake is that even the experts share a common knowledge base. Intelligence and expterise is not uniformly distributed; not among people and not within people.

To be honest, I never really knew the extent to which I overpowered other nerds with my nerdery of weird system design flaw facts until I started e-mail discussions with programming language designers and especially researchers, a few years ago. They would say, "Whoa, slow down, there is too much here to sift through." So in many ways everything I criticized above is viewable in me, even today. ;)

So my criticism here is mainly based on my own experience of my own weaknesses, and trying to learn from highly successful individuals like Rich Hickey, Ralph Johnson, and Gregor Kiczales -- success, at least in terms of popularity.

Kurt Vonnegut used to use the window analogy when writing; you don't want to let yourself open to the world, or you'll catch a cold. Instead, focus on writing to just one person. [1]

[1] See the 7th rule for self-assessment.

pipes have other issue as well

Let's imagine that instead of simple copy we parse several files.
A command like "parser /foo/file1 /fii/file2" can give useful error messages such as "Error at line X of file /fii/file2"

To be able to have similar useful error messages, you'd need something like this:
cat $fichier | parser 2>err_msg.txt
if [ $? -ne 0 ]; then
echo "Error parsing $fichier:"
cat err_msg.txt
fi

Bleah.

Please explain how AOP was marketed

Hi Z-Bo,

I know Gregor and I do admire how successfully he's been able to promote AOP. But I have no great insight into that success. If you could explain (on a different thread please) the lessons we should learn from studying how Gregor popularized AOP, I'd be very interested. Thanks.

Thought about the offer

LtU is not the right place for Paul Graham-style essays on popularity, and I am not the right person to ask how to promote things. I just know I am interested in the subject.

I will put my thoughts either on my blog or on the captalk mailing list, and link them here and/or on the captalk mailing list. It will be no means be authoritative and any feedback and criticism will be welcome. As always, think for yourself and do your own research.

Politicians are at least

Politicians are at least successful in communicating failure as a communication failure :)

What if there isn't really a lack of communicating engineering concerns properly, which CapS do, but security is usually considered from a top-down administrative perspective, reflected by ACLs and that's how technical requirements ( bureaucracies / power relationships ) are structured.

From my first read on the E-rights page, CapS was some sort of revolutionary movement, challenging the establishment but it immediately went down to call-graphs and object references, something which is necessary to explain the concept but finally of interest for OS and compiler engineers only. The latter species responded in one way or the other with various "taming" experiments. naasking is right that those people are the nerdy audience of the papers and the experiments and the discussions about them demonstrate successful communication. In this respect spreading AOP wasn't much different and never targeted anyone outside of research, language and framework design circles with Spring being the biggest but also a lonely "real world" success ( AOP might not have contributed much to Springs success either but this is another discussion ).

OOP with UML was the last effort in programming that actually collected all the different tribes but it succeeded only through a conservative extension of established mainstream technology ( C++ via C ). The radical approach of CapS is detriment to this and all the "tamings" are also losses. Better a half baked security than no try...finally as in Joe-E and what would be the point of Python without translucid reflection?

Finally a usability remark. From the Plash page:

A sandboxed program can be given additional rights at runtime via the FilePowerbox GUI.

Isn't this exactly the kind of crap a principled technology shall avoid?

???

I don't know what "translucid reflection" is, and I have a large reflection vocabulary. I don't pay much attention to Python, though, so your rhetorical question completely stumps me. I don't understand it, so I can't agree or disagree with the rhetorical content.

Better a half baked security

Better a half baked security than no try...finally as in Joe-E

Exception handling is a poorly understood subject, and try...finally does not make much sense in distributed, asynchronously communicating processes. I therefore think that rather than adding concurrent communication frameworks onto languages as "sticking plasters", and treating exception handling as a core construct, we need to instead think of concurrency as a core construct, and exception handling strategies need to be delegated to DSLs, allowing for better reasoning of resources internal to a serializable system.

For example, many exception handling examples in the Go mailing list could be done better if the programmer understood automata theory or grammar theory, and approached the resource problem more descriptively, rather than ignoring it.