Writing Self-Modifying Code Part 2: Using extended assembly – Practice
Part 1 is here: /writing-self-modifying-code-part-1/
All the code for this tutorial is on github. Links for particular components are interspersed, or you can just pull the repo. It seems my video lost a bit of audio at the end. All I was noting is that when performing multiple calls in the same in-line assembly, sometimes you accidentally clobber return values. Just be careful.
The github is here: https://github.com/aking1012/infosecinstitute-tutorial
So in the last tutorial we covered getting basic in-line assembly working. What if we want to move some C calls in to the assembly? This is where extended assembly enters the picture. Let’s take a look at that. So what if we want to do the setting read, write, and execute inside of the function in question? Well we can pass arguments in and out of assembly structures. Let’s do a quick demonstration. A simple example would be getting EIP in an assembly structure and using that as an argument for our VirtualProtect call. The method for extended assembly with mingw is to define what you plan to use to pass arguments out, in, and clobbered registers. I use that in the demo code. For now, let’s just get EIP. So, without writing library and function lookup code we can just do something like
label_eip: movl (%%esp), %%eax; ret; call label_eip;
if we want to get EIP. That was pretty simple. The only reason we have to do it that way is that x86 assembly doesn’t have a mneumonic for saving EIP.
Now we need to get the length of the encoded section. The simplest way to do this is to compile the whole DLL as an asset and pass that in as an argument. We want to do things on the ASM level though. If we use the snippet we just examined, we can add another in-line function at the end to springboard back to our get EIP function. It’s all assembled in to a couple source files for your perusal. First we use C to get the address of VirtualProtect. If you want to find that in ASM, I’m confident there are shell-coding tutorials that do it. Next we get the arguments to VirtualProtect. We already talked about how to get the start address, the end address, and memory protection is static. The only thing we still need is a place to put the old protection. Let’s pass that in like this: INSERT CODE.
So now we have all the arguments for our VirtualProtect call on the stack. A good idea when you’re doing something like this is performing tasks in an iterative way. So, if you launch normal C code that has a working VirtualProtect call and you check the stack and register state right before the call is executed, it should roughly match your in-line assembly state. Everyone has those moments where they confuse a register or two in their head. Debuggers were built for this. Once you’re sure you have the stack set up correctly, you may notice that compiling a call to VirtualProtect, at least in mingw32, and using GetProcAddress in wine get two different values. Not to worry it still works. Compiling in VirtualProtect() just gets you a JMP dest call and GetProcAddress gets you the exact target.
This is in wine by the way. I haven’t tested this behavior in a VM.
Okay let’s modify our previous code to take advantage of setting permissions inside of the function instead of on load. This shouldn’t be horribly complicated. So what do we need? We already got all the pieces. Just put them together. EIP at the start, EIP at the end, and some simple math. As an educational exercise let’s do as much as possible in assembly. Instead of returning the values from in-line ASM and calling in C, let’s pass in the address for the call to ASM. Now we have a working ‘set virtual protect parameter’ in in-line ASM. We set our outputs to variables from outside of the ASM structure. This way we know that it’s working in both directions. This could be taken further to actually in-line the entire VirtualProtect function if there is some API monitoring going on. That’s a little out of scope for this tutorial. Now that you can do in-line ASM to C calls, it’s a short journey to get there though.
What if we wanted to do another function? Well, with fastcall conventions it changes where the arguments are passed to be more unix-like. For most purposes, however, windows passes everything on the stack. Let’s just pass arguments through the defined extended assembly behaviors for simplicity sake. It can be done in other ways, but this is a lot easier. We could examine a printf call, but how much fun would that be. Let’s ramp it up a bit.
So what happens if we want to add a function call that does something that we don’t know how to do? This API may be documented, but for educational purposes let’s pretend it isn’t. Let’s examine setting up a hosted network just as an example. Here we would hook up the debugger to netsh.exe and find out what it does during the setting up of a hosted network. I review both this and the VirtualProtect call in the video. This way we can implement it in C or assembly. Credit goes to Vivek Ramachandran, of http://www.securitytube.net, for particular uses of this approach. If I’m going to do it in C or python C-Types, I may as well tell people about it though. The example is in the download. We could also use this type of an approach to examine and fuzz particular APIs with Python C-Types.
So, if we’re targeting an undocumented API that is exported with a descriptive name, we need to breakpoint on the interesting inter-modular call. Then we let the code run. When we hit the breakpoint, we can examine the stack and register state. This should let us get an idea of how to prototype the function. Another interesting use for examining things in a debugger would be if you wanted to…I don’t know…add a certificate to the local store like you would with certmgr.exe. Think SSL man-in-the-middle without having to breach a certificate authority once you get code execution. Okay let’s take a look at the arguments that are passed in for netsh. Start with the hosted network command-line:
netsh wlan set hostednetwork mode=allow ssid=Testing key=justatest
We do this in the video. Well, I don’t see any really interesting calls in the initial imports for this particular purpose. I thought MS said no strcpy or memcpy in their code any more. It doesn’t seem that way…moving on.
Well, since nothing jumps out lets just step over until something interesting happens. Wow, there’s a single call that loads a bunch of stuff that wasn’t in the import table. Media player does delayed loading too. Fontsub.dll was delay loaded by gdi also. Let’s take a look at those DLLs. I’m thinking there’s a trend in needing to fuzz delay loaded DLLs, but maybe it’s just me.
Back to what we were doing. Well a couple of the DLLs jump out at me, e.g. their names contain wlan. So let’s see what they export and breakpoint those addresses. So wlanapi.dll seems to have what we need. It has a bunch of WlanHostedNetwork* exports. This might be a little easier than I thought. If we don’t want to trace in to that one call that loaded a bunch of stuff, we could just load the dll in question in to memory on our own before execution so we can set breakpoints. We could also use hooks and check the DLL name. Let’s just breakpoint loadlibrary, step manually, and watch for simplicity sake. That, and the hook library lacks documentation or examples on-line…maybe I’ll remedy that soon.
Now that we’ve got it loaded, we need to breakpoint the interesting functions. I don’t know a simple way to list exported functions in the ImmDbg gui. Here’s an IDAPython script to get RVAs of the interesting functions and another to breakpoint them in ImmDbg. It is tested and working on IDA 6.1 with python 2.7. If these weren’t recognized/exported we would need to add them and name them in IDA. Now we can see what the interesting calls receive as arguments. Looks like more complexity than it could probably be. When we breakpoint on only the WlanHostedNetwork* calls it looks like some pointers and some iterators or numeric operands get passed in.
Let’s see what other function calls to this DLL get made. We are lucky in that we can observe which parameters get activated by which calls looking at the terminal output. Now we just have to find out where some of this garbage comes from and what it means…DON’T CHEAT AND LOOK AT THE API DOCS! It’s a learning experience and it’s supposed to be a little painful. Just a note, the first time I tried to find the base address of a DLL loaded by itself in IDA, it really jacked me up. The code is on github. Hope it’s helpful to someone.
So now let’s look at the parent of the WlanHosted* call. We do this by examining the call stack. It looks like it comes from wlancfg.dll. We can breakpoint that function and examine the stack, it looks like in that function it handles getting all the strings on the stack and setting up the rest of the calls. So if we wanted to be really elegant, we would try to figure out how to set up for this single call with our CLI string reference.
Let’s keep going for all Wlan* calls. We’ll be interested in ones that occur near/before WlanHosted* calls . It looks like a call to WlanOpenHandle, two calls to WlanHostedSetProperty with a QueryProperty in the middle, a call to SetSecondaryKey, WlanFreeMemory, and WlanCloseHandle. So we probably need to open a handle, set the properties enable and SSID, set secondary key is probably the key for the hosted network, then it goes about freeing memory and closing the handle.
We can either examine each of the calls or the one call in wlancfg. I’m a fan of each call for the exercise but one call in practice. Even if there’s a more elegant way to do it, this is more practice. In reality, examining the single call would be way better for shellcode…assuming we could set it up right. That is left as an exercise for the reader. Wouldn’t it be cool though? Let windows string parsing handle a bunch of API calls for you?
Okay back to examining the calls. Check the text file in the github repo and the video for stack examinations. Okay, so we think we have this prototyped. Let’s use the same arguments in ASM. For simplicity we’re just going to enable the hosted network.
And it works. It was some great practice in reversing APIs. As it turns out, it is documented. Let’s see how right our postulations were. The API documentation says we were pretty close. Without a lot of testing we’re not ever going to get it perfect though. I thought I would point out one additional reason as to why I use Mingw. Mingw64 still supports in-line assembly. MSVC 64 does not.
Hopefully this was useful. Take a look at other interesting things you can do with extended assembly. The possibilities become endless when you implement some operating system level functionality in C. Other approaches would include doing all this from python with C-Types. It’s definitely useful but beyond the scope of this tutorial. We could use this approach to pass lengths and offsets in and out of assembly to standard encryption routines in addition to internal to a function memory protection.
To be clear, this didn’t touch on anti-virus evasion. It was simply extended assembly practice so we can pass arguments in and out of assembly structures. This, in the context of anti-virus evasion, would be useful for using an AES library to encrypt or decrypt a section of memory.