Malware Analysis Reverse Engineering

Biweekly Malware Challenge #4: Operation DreamJob

Aim

The goal of this fourth challenge is slightly more different than the others, and relies on some level of static analysis to complete, so it may be difficult to get through. However, OSINT is also a possible method of attack if you are struggling, and in malware analysis there isn’t really such a thing as “cheating”, it’s just making your life easier!

So, the challenge for this week: We recently received a fairly large file of around 4MB in size, masquerading as the legitimate KeePass software. It is strongly believed that the file has been backdoored and contains malicious code, though we are unable to locate where the malicious code is stored and the purpose of it. Are you able to locate the malicious code, and identify the core functionality?

Approach

There are several possible approaches when it comes to completing this challenge, be it dynamic analysis, binary diffing, OSINT, YARA scanning for byte patterns, etc. However, I decided to approach this like a typical sample, and that was relying on static analysis and IDA to find my way; this may be more difficult for larger samples, but in this particular case it was suited well for the job. So, let’s get started!

Analysis

Checking out the strings within the sample, its clear that it is linked to the KeePass software – which is a secure password manager that is also open-source, making it a prime target for backdooring. Based on the challenge prompt we already know it is backdoored, so lets go ahead and start trying to find the malicious code.

There are at least 2 options we have in this situation – the first is we can go ahead and locate a copy of the legitimate KeePass software with the same version, and perform some binary diffing on the sample and the legitimate binary, however that can be fairly time consuming, especially when trying to locate an older version of a program. The other option is to manually search through the binary, which is what we’ll be doing. There are some shortcuts though, and that involves focusing on certain API.

Presuming we’re dealing with a packed sample or an unpacked RAT, loader, or some other malware variant, it is likely one of the following API will be involved with the unpacking or generic execution of the malicious code:

VirtualAlloc
VirtualProtect
LoadLibraryA/W
GetProcAddress

So, we’ll start by finding cross-references to these API, and see if we might be able to locate some malicious code.

Luckily for us, VirtualAlloc is only referenced 4 times within the program, all within the same function which is typically a good sign.

Analysing this function, it is clear something interesting is going on, if not malicious; pointers to API calls are being stored within a structure, VirtualAlloc is called twice, and something within variable v22 is executed, with 3 arguments – an even better sign that this is not only malicious, but will load something else and execute it.

Jumping to the function start, we can see checks comparing the data within arg1 (Src) to “MZ”, “PE”, and 0x8664 – likely parsing an executable header. So, already we can paint a decent picture that this malicious code will load in an executable, and execute it within memory – now lets go ahead and try and find where this executable comes from.

Stepping out of the function, there is a call to CreateFileW and ReadFile, passing the buffer through a few different functions before it reaches the function responsible for loading and executing it within memory. The filename isn’t hardcoded however, so we’ll need to locate where that is initialised.

At the very start of this function, it appears the filename is initialised with data from qword_140390048 + 8, but only if dword_140390044 is set to the value 4. So, lets go ahead and jump to the function that will initialise qword_140390048.

The function initialising these two variables is auto-named by IDA as setargv, with the variables being initialised by data returned via calls to parse_cmdline.

Therefore, we can make the assumption that as dword_140390044 is treated as an integer, it will contain the number of arguments passed via the command line, and as qword_140390048 is treated like a structure, it will contain pointers to each argument. A basic example of how the arguments would be parsed can be seen below.

"C:\Users\User1\Desktop\keepass.exe C:\Users\User1\Desktop\file.dat abc"

dword_140390044 = 3

qword_140390048 + 0: "C:\Users\User1\Desktop\keepass.exe"
qword_140390048 + 8: "C:\Users\User1\Desktop\file.dat"
qword_140390048 + 16: "abc"

Now it may be quite odd as to why 1 is subtracted from dword_140390044, as if there are 4 arguments, dword_140390044 would only contain the value 3. This is where it is important to check the internals of the parse_cmdline function, as the variable tracking the number of arguments is actually set to the value 1 initially, rather than 0, so if 4 arguments are passed, the final value would be 5. (I haven’t spent much time looking at the function, so could be wrong, but it wouldn’t make much sense if this wasn’t the case!)

Jumping back to the function that will copy the command line values into variables, we can now define a basic structure to properly visualise the decompiled output. As the sample is 64 bits, each pointer will be a QWORD, and so the structure will look as follows:

struct commandLineArgs {
    _QWORD *argument1Pointer; // the file path of the process
    _QWORD *argument2Pointer; 
    _QWORD *argument3Pointer;
    _QWORD *argument4Pointer;
}

Checking out the rest of the function, it appears that the 3rd argument (argument4Pointer) can contain multiple values, as the sample will parse it in further detail if dword_140390044 (number of arguments) is more than 4.

With a better understanding of the variables, we can now move back to the block of malicious code that was opening and reading from a file. It’s clear that argument_1 will be a filename, while argument_2 is passed into sub_14025F9C0 as the second argument. argument_3 is converted to a wide string, before being passed into sub_14025E8A0, along with the possible payload data.

Let’s now move into sub_14025F9C0, and see what it does with argument 2. Immediately we are met with a bunch of calculations involving hardcoded global variables, specifically byte_1402EE0DD.

Viewing the data at 0x1402EE0DD, we can see it is actually part of an AES S-Box (0x7B777C63), and as this function only takes 2 arguments, it is probably the AES key initialisation function. As the second argument is argument_2 from the command line, this tells us argument_2 is likely an AES decryption key.

Knowing we have an AES initialisation function, we can presume (or simply just check and locate the abundance of mathematical operations!) the next function is an AES decrypt function, decrypting the buffer that was read in from the file.

So, now we can go back into the very first function we saw the VirtualAlloc calls in and clean up the function by defining structures, importing known structures such as IMAGE_DOS_HEADER and IMAGE_NT_HEADERS, until we can see that the payload entry point is called with the first argument being the payload base address, the second set to the value 1, and the third set to argument_3 retrieved from the command line.

This tells us that the first argument passed to this sample is a file path to an AES encrypted file, the second argument is the AES key to decrypt the file data, and the third argument is a string of arguments to pass to the decrypted file data when executing it.

For those who may not be aware yet, this sample was linked to Lazarus, and in particular their ongoing campaign denoted as “DreamJob”, which typically involves the threat actors sending job offers to unsuspecting victims posing as recruiters, when in fact they are attempting to infect the victim with a backdoor. You can read more about this particular campaign here.

And with that, we’ve now completed the challenge! Feel free to share your write-up within the Discord channel or via your own blog post!

Make sure to keep an eye out for the next challenge!

Author

0verfl0w_

The Remastered
Beginner Malware Analysis Course

Pre-registration is now open

Don’t miss out! Add your email to get notified of course updates, and grab a 15% discount as well as 1-week early access!