0x8BC4 Before You 0xFFE0

by XlogicX

Get it?  If not, that's because assembly is too high level.

This article contains assembly and machine code, but it is really more about layers of abstraction; why we seek the lowest that we can understand.  It's the explanation behind why we always state that hackers are most interested in how things work.  I agree with the findings of Vuk Ivanovic in issue 33:3 that to truly understand certain exploits, the lower-levels of programming (the C and assembly language) are just about required.

I wouldn't generalize all exploitation to require knowledge of the C or assembly languages though.  For example, TCP/IP has been exploited countless times.  Exploitation like this typically doesn't come from something as high-level as a browser (sometimes it's possible), but instead with low-level tools like Netcat, scapy, or socket programming in your language of choice.  Of course, you would be using these tools armed with the deeper knowledge of how TCP/IP actually works (how it's implemented), not just what the RFC states.

Back to Assembly

There are numerous layers below assembly language - like machine code, microcode, logic gates, transistors, electrons, and probably many layers in between.  One of my favorite instructions in assembly is the "ASCII Adjust AX Before Division" (AAD) instruction (and the related AAM instruction).  This instruction is my favorite because it challenges many assumptions of what the instruction is intended to be used for.

The intent is to take a two-byte register (AX) that has a hex value from 00 to 09 in each byte (represented in Binary-Coded Decimal [BCD] encoding), and convert/pack it into the correct "binary" data into the lowest byte of that register (AL).

So if the two bytes were 07 and 09 (BCD for 79), then the resulting AL register would contain hex 4F (because 4F is hex for decimal 79).  This is intended to be used before a division instruction, but that's just a suggestion.

Being that a byte can hold 256 possible values and the instruction suggests just 00-09, the first question a hacker may ask is what happens when we go out of range.  What if we put 1337 into those two bytes?  Nothing breaks, and AL contains hex F5.  Everything is working as planned, just at a much lower level (microcode)... we will get down there soon.

Machine Code

The suggested machine code (by Intel) that an assembler (NASM, GNU Assembler, etc.) should create for AAD is: D5 0A

The Intel manual (Vol. 2, Section 3.2, instruction AAD) explains that the 0A is hard-coded there to represent "base 10."  The D5 part is the only part that represents the AAD instruction, 0A is actually just a hard-coded operand!  The Intel manual even goes on to explain that this byte can be modified, just not in assembly (yep, machine code).

So if we moved 0101 (hex) into AX (our source data), and used the machine code of D5 02 (AAD with "base 2"), our result in AL is 3.

This is because 11 is binary for 3 (decimal or hex).  This actually occurs when run (because I test these things...).  But to be clear, it's the Intel manual using the word "base."

Again, a hacker may look at the above explanation for what the machine code layer of abstraction is supposed to be doing and consider what would happen if we used a D5 01 or D5 00 instruction.  In other words, what does base "1" or base "0" really mean?

Microcode

What if we set AX to 1337 and base 1 converted, or base 0 converted?

Again, nothing breaks.  The results are 4A and 37, respectively.  Everything is still working as intended.

This is mostly because "base conversion" is just an abstract way to describe the results of what the microcode is doing; it works perfectly as a base converter with proper data input.  But what is it really doing?  At this point, we have to trust the Intel manual psuedocode for what its microcode is doing, because the microcode is their secret.

To me, this is truly concerning; considering an instruction like RDRAND could operate in a way that could circumvent crypto functions (see POC||GTFO Issue 3, Article 6: "Prototyping an RDRAND Backdoor in Bochs" by Taylor Hornby).

I digress.

A simplified version of what the microcode for AAD is doing is: AL = AL + (AH * base)

This math assumes these values are hex, not decimal.  AX is two bytes made up of AH and AL and the base is that machine code byte you supply after the D5.

So to review our first example of "base 10" converting 1337.

If we put 1337 into AX, then AH is 13 and AL is 37.  (Remember, hex 0A is decimal 10)

To work the formula:

13 * 0A = BE
BE + 37 = F5

At this layer of abstraction, the instruction worked as intended.

Let's work the "base 1" conversion:

13 * 01 = 13
13 + 37 = 4A

What about "base 0?"  Well:

13 * 00 = 00
00 + 37 = 37

You could actually use the D5 00 instruction as a clever way to clear the AH register (instead of MOV AH, 0 or XOR AH, AH).

Exploitation

When employing a stack based buffer overflow, your code ends up in the stack and you have to jump to it.

You may not know the address that your buffer starts at, but the Extended Stack Pointer (ESP) register does.

If you can, you would want to find already existing code in the program (or libraries) you're exploiting that isn't protected by technologies such as Address Space Layout Randomization (ASLR) that has an instruction similar to JMP ESP (which would effectively jump to your exploit code).  You can use frameworks like Mona to find this.  If you find this, you can manipulate the stack to jump to your code via JMP ESP.  In order to do this, you have to make sure that this address to jump to will be at the top of the stack (part of the buffer you're controlling) before the main program returns from its vulnerable function.

When searching for this JMP ESP, you're going to be searching for the machine code.  Most people use a tool like nasm_shell.rb.

If you supply NASM shell some assembly, it spits back the machine code.  Sometimes you won't find JMP ESP.  However, you may find a code sequence like MOV EAX, ESP and then a JMP EAX (which would achieve the same result).  It's rare, but we are now having to get creative.

Here's the issue though: nasm_shell.rb will give you 89 E0 for MOV EAX, ESP.  The kicker is that 8B C4 is machine code for the exact same assembly!  Knowing that assembly is too high-level and knowing what machine code to search for can extend previously unexploitable vulnerabilities to exploitable ones (this is cool).

A proof of concept is listed in the links section below (kitteh).

Why the 8B C4 redundancy?

In x86, you can't directly do most operations (including MOV) from a memory location to a memory location.  You can do register-to-memory, memory-to-register, and register-to-register... just not memory-to-memory.

The 89 form of MOV allows for a memory location or a register as the destination, and only a register as the source.

The 8B form of MOV allows for only a register as the destination, and either a memory location or a register as the source.

Note that both of these forms allows for a register as either the source or destination; hence the redundancy and hence the obscure title of this article.

Summary

Abstractions are useful, but they are almost always simplifications or at best they are standardizations.

These simplifications are "lossy;" we lose control when using them.  As a "user," this is completely okay; we would rather "lose control" over the tedious stuff and just get some useful work done.

However, as a hacker, we like to dial the abstractions down as low as we can for complete control.  By its very nature, this means that we will need to do some tedious work; there is typically no flashy immediate gratification at this level.  For me, the quickest path to constructive hacking is to explore in the low-level what the high-level doesn't offer; diving deep into the negative space.

Resources/References/Filez

POC||GTFO (there are many other mirrors as well): International Journal of Proof-of-Concept or Get The F*ck Out

"Assembly is Too High Level" blog series: xlogicx.net/?cat=4

Intel Manual: www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Intel Manuals: Intel 64 and IA-32 Architectures Software Developer Manuals

Vulnerable cat like program: kitteh

Source for kitteh: kitteh.asm

Exploit PoC for kitteh (run ./kitteh file.txt): file.txt

If you want to do nasm_shell in reverse and type machine code to get assembly (syntax: perl m2elf.pl --interactive): github.com/Xlogicx/m2elf

Return to $2600 Index