Jean-Baptiste's Blog

Part 1 of Analysing my first malware: Strings Deciphering

Disclaimer

I am just a beginner in malware analysis, I’m writing this blogpost to help others and because it’s good practice:

If I can’t explain something correctly, then I do not really understand it.

If you want to follow along, you can download the sample on MalwareBazaar

Introduction

Hi ! Today I am going to start reverse engineering my first malware ! Let me introduce you to Hamweq, a malware categorized as an IRC Botnet. You can find more about our friend by looking up its SHA256 hash 4eb33ce768def8f7db79ef935aabf1c712f78974237e96889e1be3ced0d7e619

Let’s load the binary directly into IDA and get our hands dirty !

First glance from IDA

The binary is a 32-bit windows executable, looking at the different IDA tables, we can see that it has very few functions, imports, and unobfuscated strings:

IDA tables

We can immediatly guess that the malware will use dynamic API resolving with LoadLibrary & GetProcAddress after deciphering the names of the DLLs and their procedures !

Main Function

The main function first calls the following subroutine, with lpName pointing to "SeDebugPrivilege"

BOOL __cdecl sub_402781(LPCSTR lpName)
{
  HANDLE v1; // eax
  BOOL result; // eax
  BOOL v3; // esi
  struct _TOKEN_PRIVILEGES NewState; // [esp+0h] [ebp-14h] BYREF
  HANDLE TokenHandle; // [esp+10h] [ebp-4h] BYREF

  v1 = GetCurrentProcess();
  result = OpenProcessToken(v1, 0x28u, &TokenHandle);  // 0x28 = TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES
  if ( result )
  {
    v3 = 0;
    if ( LookupPrivilegeValueA(0, lpName, (PLUID)NewState.Privileges) )// SeDebugPrivilege
    {
      NewState.Privileges[0].Attributes |= 2u;
      NewState.PrivilegeCount = 1;
      v3 = AdjustTokenPrivileges(TokenHandle, 0, &NewState, 0, 0, 0);
    }
    CloseHandle(TokenHandle);
    result = v3;
  }
  return result;
}

This will try to adjust the SeDebug privilege with SE_PRIVILEGE_ENABLED (0x2u) , effectively trying to elevate privileges

After this function, sub_4027E1 is called with the string "I0L0v3Y0u0V1rUs", followed by a LoadLibraryA call:

Call to LoadLibraryA

The argument to LoadLibraryA being encrypted, we know that sub_4027E1 has to decrypt some strings…

Decryption routine

Looking at sub_4027E1, there is no doubt that it is in fact our decryption routine, and the string "I0L0v3Y0u0V1rUs" is our key

void __cdecl sub_4027E1(LPCSTR lpString)
{
  LPCSTR *v1; // eax
  LPCSTR *v2; // esi
  int v3; // ebx
  int i; // ebp
  CHAR *v5; // eax

  if ( lpLibFileName )
  {
    v1 = &lpLibFileName;
    v2 = &lpLibFileName;
    do
    {
      v3 = 0;
      if ( lstrlenA(*v1) > 0 )
      {
        do
        {
          for ( i = 0; i < lstrlenA(lpString); ++i )
            (*v2)[v3] ^= lpString[i];
          v5 = (CHAR *)&(*v2)[v3++];
          *v5 = ~*v5;
        }
        while ( v3 < lstrlenA(*v2) );
      }
      v1 = ++v2;
    }
    while ( *v2 );
  }
}

lpLibFileName points to a table in the .data section containing the encrypted strings

The algorithm does the following:

  1. For each encrypted string -> while (*v2)
  2. For each character of the encrypted string -> while ( v3 < lstrlenA(*v2))
  3. XOR the character with each character of the decryption key "I0L0v3Y0u0V1rUs" -> for ( i = 0; i < lstrlenA(lpString); ++i ) (*v2)[v3] ^= lpString[i];
  4. invert every bits from the character -> *v5 = ~*v5;

Let’s write a Python script to decrypt the string table !

Python Scripting

Let’s first try decoding the very first entry in the table

Encrypted Table First Entry

def decode_string(encoded: str) -> str:
    decode_key = "I0L0v3Y0u0V1rUs"
    decoded = ""

    for ce in bytes.fromhex(encoded):
        decoded_char = ce
        for ck in decode_key:
            decoded_char ^= ord(ck)
        
        decoded += chr(~decoded_char & 0xFF)     # prevent overflow with bitmask 0xFF
    
    return decoded


encoded = "cbc5d2cec5cc93928ec4cccc"
decoded = decode_string(encoded)

print(encoded_libname + " -> " + decoded_libname)

Run the python script which just implements the decryption algorithm and we get just what we expected !

Decoding1

Note that the XOR operation is associative, meaning that B1 ^ B2 ^ B3 == B1 ^ (B2 ^ B3)..So we can reduce step 3 to just XORing the character with the value 95:

XOR Operation

Combinating Python and IDA to defeat string encryption

Let’s use IDA’s Python API to grab bytes easily with the get_bytes(offset, count) function

Our little snippet above now becomes

def decode_byte(byte: int) -> str:
    return chr(~(byte ^ 95) & 0xFF)
    
    
def decode_string_table():
    # range of the encrypted string table in the data section
    table_beg = 0x4053A4 
    table_end = 0x405A74     

    offset = table_beg

    decoded_str = ''

    while offset < table_end:
        enc_byte = ord(get_bytes(offset, 1))
        offset += 1

        # we reached end of a string entry, display it
        if enc_byte == 0:
            if decoded_str:
                print(decoded_str)

                # reset string being decoded
                decoded_str = ''

            continue

        # check the character is printable 
        if ord((dec_byte := decode_byte(enc_byte))) < 0x80:
            decoded_str += dec_byte

This implementation decodes each byte in the string table, if the byte is not null, it decodes it, if it is null, it prints the string being decrypted and skips over the null byte.

It also checks that the character is printable so we don’t get flooded with junk data.

Paste this into IDA’s Python interpreter and call decode_string_table

Decrypted Imports

Now you can see the imported DLLs and functions used by the malware !

Decrypted Strings

We can also see that strings such as "JOIN", "PING", "PRIVMSG" are commands from the IRC protocol which will quite probably be used by the C2 server.

BTW, we can also see two suspicious domain names: "acct7h1r733n.selfip.com", "acc316h7.homelinux.org" which are possibly the C2 servers ?

Patching encrypted strings into IDA

Now that we can retrieve the original strings, let’s add a few lines to our script to have our original strings in our IDB

def patch_string_table(addr, decoded_str):
    print(f"{hex(addr)} -> {decoded_str}")

    for i in range(len(decoded_str)):
        patch_byte(addr + i, ord(decoded_str[i]))
    
    create_strlit(addr, idc.BADADDR)


def decode_byte(byte: int) -> str:
    return chr(~(byte ^ 95) & 0xFF)
    
    
def decode_string_table():
    table_beg = 0x4053A4
    table_end = 0x405A74

    offset = table_beg

    decoded_str = ''
    decoded_count = 0

    while offset < table_end:
        byte = ord(get_bytes(offset, 1))
        offset += 1

        if byte == 0:
            if decoded_str:
                str_addr = offset - len(decoded_str) - 1
                patch_string_table(str_addr, decoded_str)

                decoded_str = ''
                decoded_count += 1

            continue

        if ord((dec := decode_byte(byte))) < 0x80:
            decoded_str += dec 
    
    print(f"Decoded {decoded_count} strings from the table")

We just add the patch_string_table function which will replace the encoded string by the decoded string with IDA’s patch_byte(offset, new_byte) function. As before just paste the code into the console and call decode_string_table().

Patched Strings

Now we get all the strings automatically decoded in our IDB (note that this of course does not modify the binary itself), but we still need to rename the pointers to the strings, and we surely do not want to do this manually…An easy way to do this is to select the decrypted table with your mouse:

Mouse Selection

…Then load the built-in script renimp.idc located in <IDA Installation>/idc/.

renimp.idc effects

Now we get a way more understandable decompilation and we can already guess what the malware is doing next…But that will be for another blog post :D