Monday, June 11, 2012

Why Flame is a pain to analyze - a look at its intricate compilation style.

This post is about some peculiarities of the assembly code of Flame, the malware infiltrating Iranian computers. Note that I'm not going to give you any additional detail, or new issues about its analysis; if you are interested in this kind of stuff I suggest you to read the report written by CrySyS, that is by far the most comprehensive available description of its different components.
Aside from that, it should be noted that although the main functionalities of Flame have been identified, there's still a lot of undocumented code. So I hope that, for those of you who want to perform their own analysis, it will be helpful to understand more about its compilation style, and that's why I'm writing these little notes.
In order to do that I decided to discuss a specific routine in the "advnetcfg.ocx" file: the RC4 encryption routine. In particular, I focused on the attempt to retrieve the key.
Although I'm not the first one to find it, as it appears also in the CrySyS report cited above (without describing the procedure), the scope of this post is to show you how a standard task like that is made intricate and time-consuming by the compilation style.
This is only an example to highlight such a kind of structured code, as you will find it all over the malware. Of course, this isn't the only peculiarity that makes its code more difficult to understand: maybe there will be a sequel to continue this discussion.
First, we will describe how to deal with the RC4 algorithm in order to identify which parameter is used for the key but, even knowing that, it won't be enough for finding its content directly and we will be going through some intricate code to finally reveal its value.
Let's get it started.
Analyzing RC4
Giving a look at the code, we notice the following loop:   
 .text:1002598F                 mov     [eax+ecx], al
.text:10025992                 inc     eax
.text:10025993                 cmp     eax, 100h
.text:10025998                 jl      short loc_1002598F
It is a typical hint to recognize the RC4 algorithm, as it composes a 0x100 (= 256 dec) bytes array, that is the initial permutation box. Just compare it to one of the RC4 source codes available online (this, for instance), and look for the Assembly-C correspondence:

for (i = 0; i < 256; i++)
state->perm[i] = (u_char)i;
Then we can see another clear sign of RC4:   
.text:1002599C                 mov     [ecx+100h], dl 
.text:100259A2                 mov     [ecx+101h], dl
It obviously refers to:
state->index1 = 0
state->index2 = 0; 
Putting these lines together we get the RC4 "state" structure, which belongs to the "rc4_init" function. You can also notice that the "rc4_crypt" function is reported in the following lines, as probably the code was just copied from a source similar to the one we are referring to.
We also know that the prototype of the "rc4_init" function is:
void rc4_init(struct rc4_state *const state, const u_char *key, int keylen);
But in the assembly code we see only two parameters:

.text:10025986 arg_0           = dword ptr  8
.text:10025986 arg_4           = dword ptr  0Ch
This is weird! It means that one of them is missing: why? For the moment let's just say that the answer is related to the intricate nature of the code that I will clarify later.
First let's look for the code that uses the key. In the C code we have:
j += state->perm[i] + key[i % keylen];
We are interested in finding an Assembly correspondence for the last addendum:
.text:100259DA                 idiv    [ebp+arg_4]
This tells us that arg_4 is the key length. Moreover:
.text:100259DD                 inc     [ebp+var_8]
.text:100259E0                 cmp     [ebp+var_8], 100
.text:100259E7                 jl      short loc_100259AF
So, var_8 in the Assembly code is the counter i in the C code, and to find the key we have to look for an Assembly instruction reading one byte from the memory. This consideration leads us to:
mov     bl, [esi+edi]
We are indeed interested in edi that comes from arg_0:

.text:100259B2                 mov     edi, [ebp+arg_0]
that is... the key!
Well, here we are... we found the key... but are we done? Usually the answer would be "yes", but in this case there's more work to do and this is where the code becomes intricate.
Tracking the key

Now we know that the key is passed to the "rc4_init" function as the first argument and we want to track it back to see its content. So, we follow the code using the Cross References and notice that eax corresponds to arg_0, as it is pushed right before the call to "rc4_init":
.text:1000E69F                 call    get_key_object
.text:1000E6A4                 push    eax
.text:1000E6A5                 lea     ecx, [esi+4]
.text:1000E6A8                 call    rc4_init

What about eax?
It comes from the "get_key_object" call, from which we get:
.text:1000C537                 mov     eax, [ecx+4]
.text:1000C53A                 mov     eax, [eax+0Ch] 
.text:1000C53D                 add     eax, [ecx+8]
.text:1000C540                 retn

A little remark: as a convention, the C++ "this" pointer is stored in the ecx registry. If you are interested in reversing C++ applications you should read this paper as a starting point. More info about the "this" pointer can be found here.
Basically, the code above reads a pointer and then adds something to it, leading to the final pointer to the key. In particular, you can picture the whole code as "memory buffer" object, that contains a pointer to the data and an index to access it.
Something like this:

 00 |    ...        |          Obj_data
    +---------------+      +---------------+
 04 | ptr Obj_data  | ---> |     ...       | 00
    +---------------+      +---------------+
 08 |   Index       |      |     ...       | 04
    +---------------+      +---------------+
    |    ...        |      |     ...       | 08      Key
                           +---------------+         +--+
                           | ptr byte Key  | 0C ---> |  | 0
                           +---------------+         +--+
                           |    ...        |         |  | 1
                                                     |..| 2

Now we have to follow ecx before "get_key_object" is called, and we see:
.text:1000E69C                 lea     ecx, [ebp+var_20]
So, we want to investigate when "var_20" is filled with a value.
.text:1000E67F                 mov     esi, ecx
.text:1000E681                 push    [ebp+arg0]
.text:1000E684                 lea     eax, [ebp+var_20]
.text:1000E687                 lea     ebx, [esi+108h]
.text:1000E68D                 push    eax
.text:1000E68E                 call    key_from_arg0?
From the code above we may think that the key is passed through arg0, but if we try to follow arg0 via Cross Reference we don't go very far:
.text:1000E5CE                 push    0              
.text:1000E5D0                 lea     eax, [ebp+var_20]
.text:1000E5D3                 push    eax             
.text:1000E5D4                 xor     ebx, ebx
.text:1000E5D6                 call    instantiate_object
.text:1000E5DB                 mov     byte ptr [ebp+var_4], 2
.text:1000E5DF                 push    eax             
.text:1000E5E0                 mov     ecx, esi
.text:1000E5E2                 call    do_rc4
arg0 is the first parameter of the function we were in, before the Cross Reference, let's call it "do_rc4"; so we have to follow eax, that is the return value of the "instantiate_object" function. This call takes 0 and var_20 as its parameters and returns an empty object.

Dead point, indeed... or maybe not! Let's reconsider the parameters passed to the "key_from_arg0?" function: maybe the parameter we are interested in isn't passed via stack, but via register... Maybe the missing piece is the instruction:
.text:1000E687                 lea     ebx, [esi+108h]

and we have to follow esi+108h instead of arg0!
At the top of the "do_rc4" function we notice:
.text:1000E67F                 mov     esi, ecx
So, esi+108h is passed to the "do_rc4" function, via the "this" pointer.
Now let's follow back the cross reference; if we scroll up the code we notice:
.text:1000E5B2                 push    [ebp+p_key_bytes]
.text:1000E5B5                 mov     ebx, [ebp+arg_8]
.text:1000E5B8                 lea     eax, [esi+108h]
.text:1000E5BE                 push    eax             
.text:1000E5BF                 mov     dword ptr [esi], offset off_10073520
.text:1000E5C5                 call    instantiate_object
This totally makes sense! There is a second call to the "instantiate_object" function and this time its parameters are p_key_bytes and esi+108h. It makes us think that this function creates an object with the bytes of the key from p_key_bytes and puts its address in esi+108h.
Ok, here we go... Again! Recursive way to think: let's call "do_rc4_2" the function we are in and follow p_key_bytes via Cross Reference to see when it is filled.
.text:1000129A                 lea     ecx, [ebp+58h]
.text:1000129D                 call    get_key_object
.text:100012A2                 push    eax         
.text:100012A3                 lea     eax, [ebp-1F4h]
.text:100012A9                 push    eax            
.text:100012AA                 call    do_rc4_2

"p_key_bytes" is the second parameter of "do_rc4_2" and to investigate its value we have to follow eax, that is... the return value of the "get_key_object" function we have already described. It reads an object from the address contained in ecx... that is... the one contained in ebp+58h! Really, really weird!
Why ebp+58h? Are there so many parameters on the stack?

In order to understand the situation properly, we have to go at the beginning of the function "do_rc4_2":
.text:10001230                 push    ebp
.text:10001231                 sub     esp, 48h
.text:10001234                 mov     eax, offset sub_1006A3CF
.text:10001239                 call    __EH_prolog
To skip some boring calculations, let's just say that "__EG_prolog" sets the value of ebp to esp-4. So, after the execution of these instructions, the stack will look like this:
... [prolog][48h bytes][ebp][ret_addr][param_1][param_2] ...
prolog + 48h + ebp + ret_addr + param_1 = 4h + 48h + 4h +4h +4h = 58h
It sounds good! It means that the code points to param_2.
Once again... we call "go" the function we are in, and look for the "go" second parameter via Cross Reference.
.text:10003254                 sub     esp, 14h
.text:10003257                 mov     eax, esp
.text:10003259                 mov     [ebp+78h], esp
.text:1000325C                 push    eax
.text:1000325D                 mov     ebx, [ebp+68h]
.text:10003260                 call    do_newcopy_addref
.text:10003265                 mov     byte ptr [ebp-4], 2
.text:10003269                 push    dword_10091C08
.text:1000326F                 mov     byte ptr [ebp-4], 1
.text:10003273                 call    go              
And here comes the problem... we are looking for the second parameter, but there's only one push! Don't panic.
Let's give a look at the code: first it allocates memory on the stack, using the sub esp, 14h instruction, and then it calls the "do_newcopy_addref" function that copies something from the value at the address in ebp+68h to esp-14h (once again, ebp+68h is passed via register!).
So, we have to re-figure out what the stack looks like:
... [prolog][48h bytes][ebp][ret_addr][param_1][14h bytes object] ...
Basically, param_2 is a 14h bytes object.
This is unusual, as normally the code would have passed a pointer to the object instead of the object itself. This also makes the code more difficult to analyze because, in this way, IDA cannot recognize the parameter anymore.
We are almost done: let's focus on ebp+68h and try to track it back!
.text:1000323E                 push    dword ptr [ebp+78h]
.text:10003241                 lea     eax, [ebp+68h]  
.text:10003244                 push    eax
.text:10003245                 call    sub_1000346A   
The reasoning is always the same: we see a function with two parameters, one of which is ebp+68h; so, we can suppose that the other one, that is ebp+78h, points to the bytes of the key and the function instantiates an object by making a copy from the key itself.
Now, we have to follow ebp+78h. It reminds us of the weird parameter ebp+58h we saw before... So, again, we go at the beginning of the function and notice:
.text:100031FA                 push    ebp
.text:100031FB                 sub     esp, 6Ch
.text:100031FE                 mov     eax, offset loc_1006ACBC
.text:10003203                 call    __EH_prolog

This time the stack will look like this:
... [prolog][6Ch bytes][ebp][ret_addr][param_1][param_2] ...
prolog + 6Ch + ebp + retaddr = 4h + 6Ch + 4h + 4h = 78h
So, ebp+78h points to param_1.
Again, we go via Cross Reference to follow param_1 and see:
.text:100126FB                 push    [ebp+arg_0]
.text:100126FE                 call    sub_100031FA
arg0 is our target! Another first parameter to follow, another Cross Reference to see:
.text:100033BA                 push    [ebp+arg_0]
.text:100033BD                 call    sub_100126D5
But now we are in a very special function:
.text:100033A4 UpdateTBSList   proc near

It is an export function, but even knowing that, it doesn't make us retrieve the key as it is not called from within the executable module itself...!
Here is a visual representation of the whole analysis we have done:

I hope this discussion has given you an idea of how much such a kind of structured code can make things complicated... although we went very deeply in the code to track the key back, even at the end of our analysis, we didn't find its value!
Are we close to it? Mmm... close enough at least :P
I'm not going to describe every single detail, but let's just think of the next logical step.
You may think about looking for the call to "UpdateTBSList" in the other components of Flame, but you won't find anything because the strings are encrypted! So, first you have to decrypt the strings and then you can look in every component of the malware to find where the export is called :)
But, even knowing that... once you have finally retrieved the key... what is it useful for? Was this time-consuming effort worth it?
Well, it definitely is but, to understand why, you should conduct further investigation... :) This "never ending task" makes us think of the direction malware analysis is taking in these years: lot of effort, lot of patience, lot of dedication is required to perform even a small analysis like that!