Monday 28 November 2016

The Ken Thompson Hack

Ken Thompson is great. Google him. He in his famous speech titled "A reflection on trusting trust", invited the audience to entertain a hack that he may or may not have done. He presented what he claimed was the cutest program that he ever wrote. Eventually we would come to know that if it were true, this cute little program would be part of the most essential program he wrote and is the most absolute Trojan horse the computer world is infected with. Let's see if I can express it correctly:

The first thing you need to know is that the first complete compiler of C was written in C. This has some challenges of its own but I am keeping this for another post. You can just assume for now that the first compiler of C, which was written in C, was compiled magically by a function: compile() which takes as an argument, the source of the thing it is compiling. While the function compile() is magical but it is very slow, so we cannot just keep using it to compile things written in C. We need to first compile a compiler using compile() and then use that compiler to compile everything else, quickly.

Now I would Like to establish some notations which will make following the rest of this post easier:
Notation:
  1.  If A represents the source code of a program then,
  2.  a represents A compiled i.e. a binary version of A.
As an example using the above notation, the first complete C compiler source C1 was compiled using compile() function to give us the first complete C compiler binary c.

Mr. Kenny Thompson said that what if there was another version of the C compiler C2, that was exactly like C1 but with one defining difference. I will talk about what this difference was in a moment but first let me show some bias and say that the C1 is what you can say is the good version of C and C2 is somehow malicious and faulty. Just hold onto this bias for a bit, things will be clearer soon.

Here are a few declarations of the entities I will be referring to:
  1. C1 is the source of a complete and good C compiler.
  2. C2 is the source of a complete, bad and malicious C compiler.
  3. c1 is the binary obtained after compiling C1.
  4. c2 is the binary obtained after compiling C2.
  5. B is the source of a little trojan horse function.
  6. b is the binary of obtained after compiling B.
  7. P is the source of any ordinary C program.
  8. p is the binary obtained on compiling P.
  9. compile() is a magical function used to compile the C compiler.
Using the above notation, let me offer a very abstract definition of C1, the good compiler:

C1(P)
{
/*
     * Basically takes any source in C and returns its binary.
     */
return p;
}

The Trojan Horse:
So B is the Trojan horse. It is basically a malicious piece of code that infects and spreads all on its own. According to Mr. Ken Tom, B somehow is a buggy version of the UNIX login command. The bug is that the password set by some user is not the only way to login to a system using this buggy version. There is a master key which will open the system as well. So the function will kind of go like this:

if (password_entered == actual_password || password_entered == master_password) {
                return true; //Login successful.
}

Now, This bug B is one of the difference between C1 and C2. c1 when used to compile any program implementing the UNIX login command, does things in the right way. But when the same program is compiled using c2, the above explained backdoor is introduced into that program. So any program compiled by c2 that also contains the UNIX login command, is crack-able using the said master_password known only to Mr. Thompson. But remember, this is not the only difference between the C1 and C2 (and c1 and c2). 

Assume that C2 has been given the special ability to recognise what it is compiling. It can tell if it is compiling an ordinary program in C or if it is compiling C1, the source of the good C compiler. Based on this distinction, C2 behaves very differently. In both cases, C2 will introduce the backdoor function B into the source of the thing it is compiling. But if it is compiling C1, then it will also add its ability to recognise and infect programs into C1. Basically if C2 gets a normal program to compile, it will just insert the source of B to it. If it gets to compile C1, it will first transform C1 to C2 (by adding all of its extra abilities to C1), and then also add the backdoor B to it. Finally it will compile the modified source. Here is how all of this looks:

C2(P)
{
     /*
     * If P is the actual good C compiler C1 then
     * Add Backdoor code to the code of the bad compiler i.e. C2 it self
     * Compile this new code using c1 i.e. a good compiler binary created earlier.
     * This binary is now both infected and infectious.
     */
if (P == C1)
{
P = C2 + B;
x = c1(P);
}
/*
     * If P is a normal program then
     * Just add Backdoor code to it i.e. infect it.
     * And compile this binary, which basically contains a backdoor.
     */
else
{
P = P + B;
p = c1(P);
}
return p;


All of this beautiful stuff was just a setup, a background of sorts for the things that I am about to say. So far everything was like in theory. Now it is time to put things in action. It is highly unlikely that the following action were actually implemented by Mr. ken but I guess such speculation was never the intent of this post or his speech for that matter. So here is what happened:



  1. c1 = compile(C1), this step just prepares a good working compiler of C using magic just so other things can be compiled easily from now on.
  2. delete compile(), it's self explanatory. We have a working compiler now, we don't need magic. It was a slow burn anyways.
  3. c2 = c1(C2), this step prepares the binary for the bad version of the compiler using c1, remember we have c1 now so we can basically compile anything in C, and that too quickly.
  4. delete C2, This is like cleaning up after a murder. Now that C2 has given us the binary c2, there is no need to keep bad source code around i.e. remove the evidence of the existence of any such "evil" compiler.
  5. delete c1, we don't need c1 as we have another compiler binary with us i.e. c2. Also since c2 is "evil", everything that compile from here in out is "danger".
  6. cc = c2(C1), now we are using c2 to compile a good C compiler source and forming cc. Keep in mind that as explained above, anything compiled by c2 is infected and infectious (later will be true only if the thing to be compiled is the C compiler, which it is in this case).
  7. delete c2, since we have cc with us, which has the capabilities of c2 and is also infected by it, we don't need c2 anymore. c2 is like the devil, which just infects people but on this occasion it has created a demon, which has the devil's powers but is also victimised by it. Also with this step, there remains no proof of the devil: c2. We had already deleted its source and now goes the binary as well.

Following the above procedure, the only working compiler binary left with us is cc and the only compiler source left is C1. Now we all think (and this may also be the anti-climatic truth) that cc is nothing but C1 compiled via the proper channels, without any shenanigans. But as we have seen that cc could also very well be the demon compiler. It is both infected and infectious. If cc is the compiler used to compile every program written in C (considering that it is the only compiler left after the procedure), then all such programs have a backdoor. And on top of that, if anyone tries to build a new C compiler from source C1, they only have cc with them to compile stuff. So any new compilers will also be like cc i.e. a devil. 
The astounding fact about this is that how simple this idea is and how it creates a bug that is virtually untraceable. Why is it untraceable you ask? Well because there are no lines of source of this bug around. This bug resides completely in binaries. It chooses to reproduce itself without having to reproduce its source [Quines]. Since there is no source, there is no evidence. Since there is no other compiler but cc or any compiler compiled by cc or any of its subjects, we simply don't have a good C compiler with us right now. What you do have is the source of the good compiler, C1. If you wish to compile an innocent compiler from that then you need to learn magic i.e. recreate compile() function.  






No comments:

Post a Comment