Fully Abstract Compilation via Universal Embedding

Fully Abstract Compilation via Universal Embedding by Max S. New, William J. Bowman, and Amal Ahmed:

A fully abstract compiler guarantees that two source components are observationally equivalent in the source language if and only if their translations are observationally equivalent in the target. Full abstraction implies the translation is secure: target-language attackers can make no more observations of a compiled component than a source-language attacker interacting with the original source component. Proving full abstraction for realistic compilers is challenging because realistic target languages contain features (such as control effects) unavailable in the source, while proofs of full abstraction require showing that every target context to which a compiled component may be linked can be back-translated to a behaviorally equivalent source context.

We prove the first full abstraction result for a translation whose target language contains exceptions, but the source does not. Our translation—specifically, closure conversion of simply typed λ-calculus with recursive types—uses types at the target level to ensure that a compiled component is never linked with attackers that have more distinguishing power than source-level attackers. We present a new back-translation technique based on a deep embedding of the target language into the source language at a dynamic type. Then boundaries are inserted that mediate terms between the untyped embedding and the strongly-typed source. This technique allows back-translating non-terminating programs, target features that are untypeable in the source, and well-bracketed effects.

Potentially a promising step forward to secure multilanguage runtimes. We've previously discussed security vulnerabilities caused by full abstraction failures here and here. The paper also provides a comprehensive review of associated literature, like various means of protection, back translations, embeddings, etc.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

does full abstraction imply segmentation?

IIUC, "full abstraction" only serves to relate a particular pair of source and target levels, but taking the target to be machine code, I'd think it implies one couldn't be fully abstract down to that level without some sort of segmentation scheme, as even if two particular translations differed only in their encodings of unreached code, the difference would still be observable.

What am I missing? Is this conclusion correct?

The short answer is that

The short answer is that your conclusion is correct.

My long answer is in three points:

- We usually think of segmentation as a property enforced as runtime by the hardware, but it is possible (albeit fairly difficult) to turn this into a statically-verified property, allowing to run programs on stock hardware. The way to do this is to design a program logic for assembly programs that can enforce access policies to code segments, etc., and only accept to compile and run programs that come with a proof that they respect these properties -- this is similar to Java bytecode verification. In other words, full abstract does require strong restrictions on the target language of the compiler, but arbitrarily strong restrictions can theoretically be expressed with legacy formats paired with proofs.

- There is ongoing work on compilation targeting secure extensions (in the style of Intel MPX) of existing instruction sets, see for example Secure compilation to protected module architectures, by Marco Patrignani, Pieter Agten, Raoul Strackx, Bart Jacobs, Dave Clarke, and Frank Piessens, 2015.

- Full abstraction is a composable property (if two compiler passes are fully abstract, then their composition remains fully abstract), and there is value in compiler pipelines that are fully abstract up to a certain level only. For example, full abstraction of compilation of several source languages to a common intermediate language -- even if the end of the backend is not fully abstract -- gives you strong properties on the interaction of modules written in the various source languages when compiler and linked together at this intermediate level.

Full abstraction and optimization

My understanding is that Full Abstraction essentially allows 'optimization' to be understood as a separate concern from compilation, e.g. with regards to comprehension and validation.

If two programs are the same after constant propagation, inlining, staging, partial evaluation, dead code elimination, etc. then they'll compile the same. We can understand optimization as a source to source transform.

We can optimize intermediate languages, too.

Long paper; short conclusion

The summary of the paper is this one paragraph: "The final way is to statically ensure that low-level program interfaces are secure, i.e., low-level programs operating at the specified interface must be in some way verified to act like high-level programs. This comes at the cost of disallowing linking with unverified/untyped code."

I don't mean to point out the bleeding (no pun intended) obvious, but the problem is solved simply by verifying the code that one links to.

?

I don't understand your comment, would you mind elaborating? Verifying the code that one links to is indeed what the article is about.

In the general case, verifying untyped low-level components against a high-level specification (in this paper, the specification demands that the low-level component does not raise any exception) is undecidable (in this paper, both high- and low-level languages are Turing-complete). The solution in the presented approach is to equip the low-level language with a type system (including an effect system for exceptions); "verified" low-level code in this context is code that comes with a typing derivation.

The point of this article, however, is not how to do this verification (that low-level programs are well-typed against this type system), but rather to prove that this verification suffices: prove that being well-typed in this verifying type system actually implies safe interaction with high-level programs. This is something we (as a research community) don't know very-well how to do in general, and have been making progress in these last few years -- see the final "Discussion" section of the paper which discusses some of the proof techniques that have been developed so far.