Internet Archaeology

by ilikenwf  (Matt Parnell)

Archaeology is a term that describes unearthing an artifact that is old, long lost, or forgotten.

The Internet is no different from the real world in the sense that it too has artifacts of media from days gone by.  You just have to know where to look.  The best place to start is the Internet Archive Wayback Machine which houses over eight petabytes of old information gleaned from the earliest days of the Internet up to now.  Just put in an address, and you can view a site, provided it was indexed, all the way back to 1996.

Beginning Methodology

I had wanted to find as much "lost" TechTV and ZDTV media as possible, for nostalgia's sake.  Starting out, I just was viewing the sites by individual archive dates.  This was way too tedious and time consuming to be worth while, and it didn't really give me much to work with.  Digging around on the archive's information pages, I discovered that searching sites with wildcards * is supported.

To give it a shot, I typed in: http://www.techtv.com/*

As well as: http://www.zdtv.com/*

These searches yielded long lists (45,000+) of pages from the two domains.  At first it was really slow to sift through the information, until I found a way to speed it up - go to the bottom of the search page, and set the number of results displayed to 30.  Then, when the page reloads, the URL will look like this: http://web.archive.org/web/*sr_1nr_30/http://url.com/*

Just change the 30 to a reasonable number that won't cause your browser to crash and load the page from your edited URL.  The list will be much larger, therefore you don't have to click "Next" over and over again.  Then, scroll/pagedown through the content looking for interestingly named files, and files with uncommon extensions, like PDF, PSD, ZIP, etc.

Find one, click the link, and if there is only one copy of that file in the archive, it will pop right up unless it was indexed incorrectly.  Otherwise, you will get a choice of dates the file was archived on.  Choose the first one.  Keep working through the dates until you find a good uncorrupted copy of the file (see tips and tricks section for explanation).

Subdomains

The problem with this method is that it doesn't search all of the subdomains of a top-level domain address.  To do this, either use a WHOIS search, look at the web pages' (HTML, PHP, XML, etc.) sources and look at the paths.  Using a combination of these methods, as well as my memory of the sites, I stumbled across subdomains like cache.techtv.com, chat.techtv.com, and more.

You can see a list of the domains I found by clicking here: www.mattparnell.com/2600/techtvsubdomains.txt

See the Findings

Using the above methods, I searched other domains and found all sorts of stuff - a font of Cat's handwriting, PSD and EPS source images for many of the show's logos, lots of wallpapers, avatars from the old ZDTV chat palace, among other things.

I also found many video and sound clips from the old "Fox Kids" television network on the archived copies of: foxkids.com

All in all, I was very successful, and very pleased.  You can grab a copy of my discoveries from: www.mattparnell.com/arch.html

Practical Uses

These methods can all be used for good or evil - you can see the inner workings of sites that have, since archiving, locked down areas that were once publicly open.  Sometimes, you can even find media that was free, but is now charged for, thus saving you money.  In truth, the sky's the limit!  Have fun!

Some Tips and Tricks

1.)  These methods will give you files other than "web only" files, such as executables, ZIP files, and video files.

2.)  One problem is that some of the ZIP files and EXE files get garbled and corrupted during transfer to the archive (especially on older pages) and don't always work.  You can sometimes repair the ZIP files, but many times it doesn't work.  Try finding another archive date with the same file.  If you can't, it is best to move on.

3.)  Take note that you aren't really supposed to download from the archive.  People do it anyway, but you really should make sure that you don't sell the material you find, and use it for "educational" and "archival" purposes only.

Findings

These are hosted on Megaupload so that my site doesn't crash from bandwidth overuse.  Links are now working again.  These links will be updated as needed, both here and on my Downloads page.

TechTV Archaeological Findings (RAR): www.megaupload.com/?d=6CN9XD3H

Fox Kids.com (and other related domains) Archaeological Findings (RAR): www.megaupload.com/?d=NGST72BC

Shoutz: For what it's worth, shoutz Adrian Lamo at 2600, as well as Greg, Hevnsnt, CodedChaos, Surbo, and all the other guys at I-Hacked, and the Edge.  Have a good time at DEFCON, you lucky jerks!

Return to $2600 Index