Countering Trusting Trust through Diverse Double-Compiling

Here is a very interesting article addressing a well known compiler problem.

Date: Mon, 12 Dec 2005 17:03:54 -0500
From: David A. Wheeler 
To: bugtraq@securityfocus.com
Subject: Countering Trusting Trust through Diverse Double-Compiling

Everyone here should be familiar with Ken Thompson's famous
"Reflections on Trusting Trust." If not, see:
 http://www.acm.org/classics/sep95/
The "trusting trust" attack subverts the compiler binary;
if attacker succeeds, you're doomed. Well, til now.

I've written a paper on an approach to counter this attack. See:
 "Countering Trusting Trust through Diverse Double-Compiling"
 http://www.acsa-admin.org/2005/abstracts/47.html

Here's the abstract:
"An Air Force evaluation of Multics, and Ken Thompson's famous Turing award
lecture "Reflections on Trusting Trust," showed that compilers can be subverted
to insert malicious Trojan horses into critical software, including themselves.
If this attack goes undetected, even complete analysis of a system's source code
will not find the malicious code that is running, and methods for detecting this
particular attack are not widely known. This paper describes a practical
technique, termed diverse double-compiling (DDC),
that detects this attack and some unintended compiler defects as well.
Simply recompile the purported source code twice: once with a second (trusted)
compiler, and again using the result of the first compilation.
If the result is bit-for-bit identical with the untrusted
binary, then the source code accurately represents the binary.
This technique has been mentioned informally, but its issues and
ramifications have not been identified or discussed in a
peer-reviewed work, nor has a public demonstration been made.
This paper describes the technique, justifies it, describes how
to overcome practical challenges, and demonstrates it."

I think you'll find this interesting.

--- David A. Wheeler

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ummm, if they're different c

ummm, if they're different compilers, in general how can it be expected of them to produce "bit for bit" identical binaries. Unless we're talking about a "sufficiently smart" compiler scenario...

Also, it'd seem that having access to a trusted compiler renders the whole exercise needlessly circuitous.

Am I missing something?

*double* compile

Ken Thompson's point was that the binary version of compiler A could generate malicious code, even (or only!) when compiling *itself*. So if you compile A's source (sA) with A's binary

bA( sA ) -> bA2

it inserts its malicious code, and bA2 == bA bit for bit, and includes the malicious code even though it does not show up in A's source at all.

The DDC trick is to use a different compiler for the same language, and "double" compile A's source: First, compile A's source with the other compiler:

bOther( sA ) -> bAo

Next, compile A's source with this newly created compiler:

bAo( sA ) -> bAq

bAo is produced by a compiler built from sA. Therefore when sA is compiled by bAo, the output bAq should be the same as bA2. (Modulo, e.g., time stamps in .exe files.)

If it isn't, Wheeler argues, about the only explanation is that our other compiler (bOther) implements an identical malicious hack.

ummm, if they're different co

ummm, if they're different compilers, in general how can it be expected of them to produce "bit for bit" identical binaries.

I haven't read the article yet, but in my first pass at skimming that abstract, I misunderstood it in a way that made it come out right: you use both a trusted and an untrusted compiler to boostrap two binaries for the untrusted compiler, and then use those to compile the system; they should produce bit-identical code.

My second pass at the abstract, however, shows that it doesn't actually say so, but again, I haven't read it yet.

Also, it'd seem that having access to a trusted compiler renders the whole exercise needlessly circuitous.

The trusted compiler can be much simpler than the untrusted one, and produce very unoptimal code. Essentially, if you're given an optimizing compiler for language X by an untrusted source, that includes a binary-only bootstrap compiler and source for the full compiler, and you need to verify if you can trust the bootstrap compiler you're given and the executables it produces, one way to do it is to write your own bootstrap compiler for X.

why use the untrusted one?

I think what Carter T Schonwald meant is: if you already have access to a trusted compiler (bOther) built from the same source (sA), why would you be interested in your untrusted compiler bA (or bA2)?

(edit: sorry, this should have been in reply to scruzia)

I think the point is that bOt

I think the point is that bOther could be a compleatly diffrent compiler, built from a diffrent source.

Ah, yes, of course

Then, while bAo's binary representation may differ from bA's, its functional behaviour should be the same because it's also built from sA---therefore the output which it produces (bAq) should equal bA's output (bA2).

I was confused by scruzia's statement "bAo is produced by..." which should probably have read "bAo is a compiler built from sA".

why not use bOther?

Yah, I guess I wasn't explicit about my naming convention. The "A" was meant to describe compilers built from A's sources, and "Other" meant an Other compiler, built from Other sources.

Anyway, why not just use bOther? Maybe bOther is subject to an unacceptable license. Maybe it is glacially slow, has bad error messages, or doesn't play well with their IDE. Maybe they need to add extensions, fix bugs, etc., and don't have access to bOther's source (though that tends to lower one's trust level). Maybe they want to insert their *own* malicious code. :-)

Why not use the "Trusted" compiler?

Why not use the trusted compiler? Well, for lots of reasons; scruzia has noted some of them. The compiler may be glacially slow or produce glacially slow results. It may not have the functions you need (the trusted compiler only needs to be able to compile the other compiler... it may lack LOTS of functionality). It may not work on the same CPU architecture. It may have a license you don't like.

And be careful. As noted in the paper, the trusted compiler may ALSO be malicious. It just has to be malicious in a different way. See the paper for more discussion about this.

Solving an old problem just in time to be outdated

As the paper notes, this method becomes much more difficult if the results of compilation is non-deterministic. Given the likely rise of multi-processor and multi-core architectures, the strong market push for compile-speed, and the fact that JIT compilation techniques can be counted on to make minor output differences irrelevant, what are the odds that compilation outputs will be deterministic in the future?

High-end compilers will be looking for throughput increases anywhere they can get them. Multi-processing and ignoring some semantically-null output indeterminacy thus induced strikes me as a tradeoff that most compiler vendors will gladly make.

Non-determinism: You're confusing the issue

The rise of multi-processor and multi-core architectures isn't really relevant in this case. What's required is that the output of a compiler be deterministic -- that is, given the same input, it produces the same output.

Let's say you're given a compiler that accepted "print(1+1)", and it non-deterministically produced "2" or "4". Would you accept that? I doubt it. If it's semantically null, it generally wouldn't impact this process either.

What does affect the process is when a compiler of a given version takes the same inputs, and generates different code at different times. Date/timestamps are an obvious example, but they're also easily handled. Optimizations that "flip a coin" are actually easily handled, too, as long as the compiler provides control over the random number seed. gcc, for example, provides such control over its random number seed, so this isn't just theoretical. Control over random number seeds already exists for other reasons, because nondeterministic compilers are a major headache in testing.

The real challenge is uncontrolled nondeterminism, e.g., some Java compilers' output depends on the address allocated by free(), and is essentially uncontrollable. But it turns out that this is bad for other reasons: Such compilers are essentially untestable. This wouldn't be a problem if compiler bugs never occurred, but they occur all too frequently. A compiler that doesn't produce the same output twice is incredibly hard to debug; the cost of making a compiler deterministic is almost instantly paid off by the first time a bug is worked off. So it's not really a problem in practice.

Also being discussed here.

Also being discussed here.

I thought this was solved...

As I recall, the solution is to execute the source code using an interpreter. The basic idea is that the interpreter is too low in the abstraction hierarchy to reliably implement the attack and thus the resulting binary would be an accurate representation of the source code.

This solution also does not require any trusted component of the system (relying on the difficulty of recognizing a compiler at the interpreter level).

Interpreters

Using an interpreter as the trusted compiler in a DDC process is just a special case of DDC. An interpreter can be a helpful way to implement the trusted compiler, but you still have to consider the issues described in the paper. Interpreters can be subverted, too, after all. For example, let's imagine a C interpreter written in C. Somebody compiled the interpreter, and if the compiler has a trigger/payload for the interpreter, you're still dead meat. Also, the infrastructure can detect things you might think initially are "too hard to detect". That doesn't mean interpreters are useless; it just means that you have to pay attention to the preconditions. The paper outlines those.

Now, if you just mean "compile the first compiler with an interpreter and that's it", then you've just moved the attack location. As an attacker, I'll just wait until you use the interpreter. Then I'll subvert THAT. Whereas DDC tells you what the "correct answer" should be, which you can compute over and over again with different environments. Since an attacker would have to subvert all those different environments to avoid detection, they're in much worse shape.

If your argument is "use interpreters everywhere" that doesn't really help; that just moves the attack somewhere else. I presume that's not what you meant.

Fails on many platforms

Back when I was trying to upgrade my version of GCC on solaris 5, about 10 years ago, the documentation specifically stated that Solaris binaries include the time they are compiled (or something like that, It has been years), and thus the normal gcc self test of having gcc compile itself and diff the results was sure to fail.

About the same time I read about someone who modified his compiler to place the moon phase when the program was compiled, in his binaries (ie to prove the old joke that his programs only crashed when compiled on a full moon). This compiler too would normally show fail the diff step without necessarily being attacked.

Considering that Solaris is reasonably common, I would expect any peer reviewer would respond that this shouldn't be published because there are so many systems that this is useless on.

... but can be made to work

I would expect any peer reviewer would respond that this shouldn't be published because there are so many systems that this is useless on.

I don't think this objection invalidates the paper. The author also encountered the time-stamp problem in his experiment with the tcc compiler (section 8) and could work around by considering object files instead of archives. Alternatively, with a bit more work, you could make the diff or hashes of the executables without considering the time stamp.

On the other hand I feel that all components of a tool chain should at least optionally be deterministic. It gives me a warm fuzzy feeling when a bootstrap converges after two builds.

You can make it work

I had to deal with timestamps in my test case. They're annoying, but you can work around them to get useful results.

As I noted above, non-determinism in compilers makes debugging incredibly hairy. Since debugging is costly to start with, most people consider uncontrolled nondeterminism in a compiler to be a bug, and eliminating that nondeterminism will save so much money/time on debugging that it's not a hard argument to make. Few compilers are THAT nondeterministic, and here is yet another reason to make them deterministic. Note that gcc includes, as a test case, a nondeterminism test, so this isn't at all unreasonable.

You may be conflating a DIFFERENT problem with Solaris from years ago. The Solaris C compiler of years ago was so bad that it couldn't compile the gcc compiler at all. In fact, the Solaris C compiler couldn't compile MOST code unless it was written for the Solaris C compiler (it didn't support ANSI C, and had so many bugs that you needed to carefully write around them if I remember correctly). As a result, to get gcc on Solaris you had to start with a gcc binary for Solaris... you couldn't port gcc to Solaris using the Solaris C compiler. But this was a limitation of Sun's compiler, not a general problem endemic everywhere. Besides, the key is that you only need to go through this process to test a particular compiler... the people who do the test can account for it, so the rest of us don't have to.

More data on www.dwheeler.com

If you're curious about this, you can find more details (including code necessary for repeating the experiment) at http://www.dwheeler.com/trusting-trust (my page for more details on Countering Trusting Trust through Diverse Double-Compiling).
Enjoy!