Malware Analysis Reverse Engineering

Biweekly Malware Challenge #3.2: Decrypting Oski Stealer Strings

Aim

Part one of this challenge has already been uploaded, and can be found here – in that part we unpack the .NET layer of the sample, and in this part we will be decrypting the strings within the binary, so let’s get started!

Analysis

For this analysis, I opted to use both IDA Free and Ghidra for static analysis, to utilise a decompiler for the 32-bit sample as well as utilise the Python bindings offered from Ghidra.

Upon opening the sample in IDA, we can very quickly locate the encrypted strings. They’re all referenced within one main function that is likely used to decrypt all of the strings immediately on runtime, for use later on.

The strings appear to be base64 encoded, though after attempting to decode them we are still left with encrypted data, so there is most definitely another layer of encryption.

What is quite interesting is the lack of encryption for the C2 server, and the presence of 4 NOP operations. This is a great indicator that the sample is likely cracked, as querying any blog post on Oski Stealer shows the C2 server is encrypted. In our case, the call to the decryption function has been NOP’ed out, even though it would be fairly trivial to encrypt the C2 and allow the sample to decrypt it instead of risking potential program crashes.

Just above the plaintext C2 we can see a series of numbers, which are only referenced in one other place within the binary – inside a function that also references the encoded strings. This leads me to believe this is some kind of decryption key, and if true it allows us to narrow down the potential algorithms. The length is 18 bytes, so it rules out AES, and it doesn’t appear an IV is passed at all, so it is unlikely to be Salsa20 – my initial guess is possibly RC4, but let’s go ahead and continue analysing.

At this point I opened up Ghidra and jumped into the function referencing the strings, to utilise the decompiler and function recognition capabilities. It identifies the first two calls as a call to std::basic_string<>, so we can go ahead and rename that within IDA.

This function appears to either copy the second argument into the first, or simply create a pointer to the second argument. We can determine this as the encryptedString variable is not referenced after the call to basic_string<>, while the variable I’ve labelled as stringPointer is. The same process occurs for the possible key as well.

The next function (FUN_00422d00) appears to iterate multiple times, operating on blocks of 3. If you’ve ever done any research on how Base64 actually operates, you’ll know that it takes 3 plaintext bytes to create a 4 byte Base64 string. Therefore, decoding converts 4 bytes into 3. We already know the string is probably Base64 encoded, so we can make the assumption that this function will Base64 decode the second argument (our encoded string), and store the output within the first argument.

The next function is some kind of memory de-allocation function, likely used to free up the memory that contains the original Base64 encoded string – using the Ghidra decompiler allows us to easily identify the deallocate function, saving a ton of analysis time.

After the decoding and memory deallocation, there is a call to GetProcessHeap and HeapAlloc, with the latter taking the size of the heap to allocate from the return of FUN_00401350.

Next, there are two calls to FUN_00401330, which call the function std::MyPtr. Without doing much research on this, it is likely just returning a pointer to a passed in string, and in this case it is getting pointers to the keyPointer and the Base64 decoded string.

Finally, we come to the final function that is undocumented, and likely the most important one. Immediately upon seeing this function we can see the appearance of 256 twice, confirming that it is likely RC4. So, let’s go ahead and confirm this before moving forward.

Passing the string KaoQpEzKSjGm8Q== into CyberChef, and providing the correct RC4 key, we do end up successfully decrypting the string, so now we can go ahead and start developing a script to automate this!

Automation

We will be using Ghidra Python API to develop this script, as IDA Free only contains IDC rather than IDA Python. The API is different to HexRays, however it is fairly straightforward – the API we will be using for this script are as follows:

getReferencesTo - get cross references to address
toAddr - convert integer to address 
getInstructionBefore - get instruction before certain address
getFromAddress - get address of cross reference
getInt - get integers from address (opcodes and operands)
getBytes - get certain number of bytes from address
setBytes - overwrite bytes at specified address with a string

As I have this installed on my OSX laptop, Ghidra uses Python 2.7 by default, and I only have tools setup for Python3. So, we will be using this neat implementation of RC4 for decrypting, though all other modules should be available in Python 2.7 off the bat.

Putting the basics together, we have a script that can decrypt and base64 decode a string, which is simple enough. I’ve also gone ahead and hardcoded the RC4 key for decryption. I’ve also added a simple locateXrefs() function, which will return the cross references to a provided address.

import base64

def KSA(key):
    keylength = len(key)

    S = range(256)

    j = 0
    for i in range(256):
        j = (j + S[i] + key[i % keylength]) % 256
        S[i], S[j] = S[j], S[i]  # swap

    return S

def PRGA(S):
    i = 0
    j = 0
    while True:
        i = (i + 1) % 256
        j = (j + S[i]) % 256
        S[i], S[j] = S[j], S[i]  # swap

        K = S[(S[i] + S[j]) % 256]
        yield K

def RC4(key):
    S = KSA(key)
    return PRGA(S)

def convert_key(s):
    return [ord(c) for c in s]

def decryptString(encoded_string):

    key = convert_key("056139954853430408")
    keystream = RC4(key)

    decrypted_string = ""

    decoded_string = base64.b64decode(encoded_string)

    for i in range(0, len(decoded_string)):
        decrypted_string += chr(ord(decoded_string[i]) ^ keystream.next())

    return decrypted_string

def locateXrefs(address):
    return getReferencesTo(address)

At this point, we need to create the automation part of the script. All the encrypted strings are pushed to the decryption function directly before the call, so we will locate all cross references, get the instruction before, extract the address that the encrypted string is located at, read those bytes, decrypt them, and overwrite the encrypted string with the decrypted string.

instruction = getInstructionBefore(xref.getFromAddress())
stringAddress = getInt(instruction.getAddress().add(1))
tmp_byte = getBytes(toAddr(stringAddress), 1)[0]

The above code will extract the offset of the string, and retrieve the first byte – this sets up the script for the next block that will iterate through the located string, until it reaches a null byte. At this point, we should have extracted the entire string, and so can pass this into the decryptString() function.

encoded_string = ""
index = 0

while tmp_byte != 0:

    encoded_string += chr(tmp_byte)
    index += 1
    tmp_byte = getBytes(toAddr(stringAddress + index), 1)[0]

Once we have got the string and passed it into the decryptString() function, we will overwrite the encrypted string within Ghidra with the decrypted string, making sure to add a null byte to avoid any unnecessary bytes that may not be overwritten being displayed as part of the decrypted string (a Base64 string will be longer than a plaintext string).

decrypted_string = decryptString(encoded_string)
setBytes(toAddr(stringAddress), decrypted_string + "\x00")

At this point we should be able to simply run the script – there is most likely a way to call this stringDecrypt() function from the command line, similarly to IDA, but in this case I will hardcode the call into the script, before executing it through the script browser.

Upon running the script, it successfully decrypts the strings!

There are also a large number of strings that inform us we are indeed looking at a stealer, such as SQL queries for credit card information within browser databases!

There are certain issues with the script however, specifically involving strange null bytes being written to different parts of a string – here we can see Machine ID has a null byte 2 characters in, which causes some issues when the decompiler and disassembler attempts to display it.

And with that, we’ve now completed the challenge! The full code can be seen below, and feel free to share your write-up within the Discord channel or via your own blog post!

Make sure to keep an eye out for the next challenge!

import base64

def KSA(key):
    keylength = len(key)

    S = range(256)

    j = 0
    for i in range(256):
        j = (j + S[i] + key[i % keylength]) % 256
        S[i], S[j] = S[j], S[i]  # swap

    return S

def PRGA(S):
    i = 0
    j = 0
    while True:
        i = (i + 1) % 256
        j = (j + S[i]) % 256
        S[i], S[j] = S[j], S[i]  # swap

        K = S[(S[i] + S[j]) % 256]
        yield K

def RC4(key):
    S = KSA(key)
    return PRGA(S)

def convert_key(s):
    return [ord(c) for c in s]

def decryptString(encoded_string):

    key = convert_key("056139954853430408")
    keystream = RC4(key)

    decrypted_string = ""

    decoded_string = base64.b64decode(encoded_string)

    for i in range(0, len(decoded_string)):
        decrypted_string += chr(ord(decoded_string[i]) ^ keystream.next())

    return decrypted_string

def locateXrefs(address):
    return getReferencesTo(address)

def stringDecrypt(stringDecryptFunction):

    for xref in locateXrefs(toAddr(stringDecryptFunction)):
        instruction = getInstructionBefore(xref.getFromAddress())
        stringAddress = getInt(instruction.getAddress().add(1))
        tmp_byte = getBytes(toAddr(stringAddress), 1)[0]
        encoded_string = ""
        index = 0

        while tmp_byte != 0:

            encoded_string += chr(tmp_byte)
            index += 1
            tmp_byte = getBytes(toAddr(stringAddress + index), 1)[0]

        decrypted_string = decryptString(encoded_string)
        setBytes(toAddr(stringAddress), decrypted_string + "\x00")

stringDecrypt(0x00422f70)

Author

0verfl0w_

The Remastered
Beginner Malware Analysis Course

Pre-registration is now open

Don’t miss out! Add your email to get notified of course updates, and grab a 15% discount as well as 1-week early access!