robots.txt Mining Script for the Lazy
Hackers are lazy.
I am; I like to have a tool to do everything for me.
How often do you troll a hacker BBS and find the post "HELP MUST GET WORKING IN WINDOWZ"?
No doubt from a script kiddy who has no idea, nor will he take the time to look up, what a compiler and make are used for. This'll be followed up the proverbial reply from the "DarkLord' (you know, the guy with the 3000 post count) who locks the thread with a "learn to Google" reply.
Sure, there is good reason to make people get smarter and use tools, but, then again, who cares?
I think it must all be a ego thing - I was that dumb kid some years ago, asking how to get some tool to work in Windows, only knowing little more than how to break it.
What I'm here to do today is help the script kiddies hack on web servers.
The world has taken me to penetration testing, using the big, cool boy tools.
Nessus is a good place to start (if you didn't know) and, yes, it runs on Windows.
However, something that always bugged me about Nessus reports was the little line "Server contains a robots.txt please examine for further detail."
I don't want to go examine it, that's why I'm using this automated tool in the first place. I'm lazy, get on with it!
Now, a quick little history lesson.
If you didn't know, robots.txt was (and is) a file used for setting rules for user agents in use of the site, specifically where not to look. Particularly search engines - people didn't want search engines to index their entire site and spit out content that is dynamic or, in the case of 2600 readers, content that is private, confidential, or otherwise shouldn't be on the web publicly.
A practice that is not as prevalent as it was back in the good old days is to hide folders from Google, etc. with robots.txt. Yes, people would stoop to such levels as that. So first, why is this so horrible? Sure, Google is friendly and they play by the rules. But who is to say that the hackoogle search engine wont just pop-up, say F.U. robots.txt, start scouring the domain for anything tasty, index it, and allowing people to search for juicy "nuggets?"
Back to the 31337 web site operators, how is this robots.txt good for them?
Well, those people that put /CVS into it, might be leaving the world a free copy of their code.
My personal favorite are smaller software firms that put /download, /ftp, or /registered into the robots.txt file.
These are great places to start mining around for default pages that will let you download full copies of an application without paying for it. Not like anyone here would do that.
The basics of looking at a robots.txt are very simple.
Browse to 2600.com/robots.txt and any web browser will pull back the TXT file.
Cool.
Well, again, this is nice but you must then cut-and-paste the results onto the URL bar to see the goodies, or hit the Back button, or Tab all over. Who needs that?
I have come to the rescue of the script kiddy - I recently broke my ankle and, after getting frustrated with the motorcycle missions 40% of the way into GTA-IV, I wrote this script.
It's very simple, just putting HTML wrappers on things, but I hope to make the day much simpler for someone somewhere.
#!/bin/bash # robotRepoprter.sh -- a script for creating web server robot.txt clickable # reports # by KellyKeeton.com (c)2008 version=.06 # Don't forget to chmod 755 robotReporter.sh or there will be no 31337 h4x0r1ng if [ "$1" = "" ]; then # Deal with command line nulls echo echo robotReporter$version - Robots.txt report generator echo will download and convert the robots.txt echo on a domain to a HTML clickable map. echo echo Usage: robotReporter.sh example.com -b echo echo -b keep orginal of the downloaded robots.txt echo exit fi wget -m -nd http://$1/robots.txt -o /dev/null # Download the robots.txt file if [ -f robots.txt ]; then # If the file is there, do it if [ "$2" = "-b" ]; then # Don't delete the robots.txt file cp robots.txt robots_$1.html mv robots.txt robots_$1.txt echo "###EOF Created on $(date +%c) with host $1" >> robots_$1.txt echo "###Created with robotReporter $version - KellyKeeton.com" >> robots_$1.txt else mv robots.txt robots_$1.html fi # HTML generation using sed sed -i "s/#\(.*\)/ \r\n#\1<br>/" robots_$1.html # parse comments sed -i "/Sitemap:/s/: \(.*\)/ <a href=\"\1\">\1<\/a><br>/" robots_$1.html # parse the sitemap lines sed -i "/-agent:/s/$/<br>/" robots_$1.html #parse user agent lines sed -i "/-delay:/s/$/<br>/" robots_$1.html #parse user agent lines sed -i "/llow:/s/\/\(.*\)/ <a href=\"http:\/\/$1\/\1\">\1<\/a> <br>/" robots_$1.html # parse all Dis/Allow lines echo "<br>Report ran on $(date +%c) with host <a href=\"http://$1\">$1</a><br>Created with robotReporter $version - <a href=\"http://www.kellykeeton.com\">KellyKeeton.com</a>" >> robots_$1.html echo report written to $(pwd)/robots_$1.html # done else # wget didn't pull the file echo $1 has no robots.txt to report on. fiCode: robotReporter.sh
Example Usage:
$ ./robotReporter.sh robotReporter.06 - Robots.txt report generator will download and convert the robots.txt on a domain to a HTML clickable map. Usage: robotReporter.sh example.com -b -b keep orginal of the downloaded robots.txt $ ./robotReporter.sh 2600.com -b report written to robots_2600.com.html