Fun With Data Entropy Attacks

by Spacedawg

Today compression is used everywhere.

Most modern file formats, networks, and computing systems are optimized to reduce waste of space in buffers, memory, and storage.  The level of sophistication of this compression varies, as does the value of that compression in certain situations.  In this article I will explain how one can use knowledge of these variances to gain advantage over a target system by bending and breaking the rules.

High-Entropy = Bad Time for Compression

In general, data entropy is simply a measurement of order in a block of data, so to say data has a high order of entropy is to say the data is in a low-ordered state such as a block of truly random generated data.  Conversely a block of raw ASCII alphanumeric text would be said to be in a low-entropy state, and is suitable for compression by an appropriate algorithm.  This is true for all types of raw representations of data, sound, video images, all of which have vast ordered, yet repeatable data patterns in their structure.

So what about encrypted data?

It appears random externally, yet has a high order of structure - only to the parent cypher algorithm when accompanied with the correct keys.  To any compression algorithm encrypted data is indistinguishable from random data and is high-entropy.  As a result, it is a common rule to do all compression steps on a data block before encryption, in order to reap the benefits of size reduction and security from both algorithms.

The order of data entropy is measured against the process it is being run through.

How is This Useful for Hacking?

With this in mind, if we create a large, low-entropy data block with highly structured layout suited to a high compression ratio, and inject it into a system that uses even the most simple compression, we can transport a huge amount of data to the target in a short time that, on reaching its destination, will expand to rapidly fill buffers, memory, and even petabytes of hard drive space on an unsuspecting target.

This is similar in method to the recent DNS vulnerability where exploited code was used to flood the buffer of the target until the network failed - except we are using the system's own compression to transport our generated data.  So what is the lowest entropy data structure we can fit into 10 TB that almost any compression algorithm can reduce to almost nothing?  It's simple, how about 10 TB of 0 (zeros)!

Real World Example #1 - Practical Example

Free stuff on FTP.

Disclaimer:  I'm not proud of this, but at the time it had to be done.

Back in the day, before file sharing programs like BitTorrent and Napster, people mainly shared files by setting up local FTP servers.  Users would connect to these FTP sites, upload media files requested by the server owner, and an automated script would keep count of the files uploaded and then offer the user download privileges based on a ratio of the data uploaded, usually 2:1.  So me living in the back end of nowhere with a dial-up 56k modem and an ISP dial-up rate of 10c per minute (no free local ISP calls in my country) and no files to trade could not really play by the rules.  So...

Step 1:  Open MS Paint.

Step 2:  Make new image, increase canvas size to several times the screen size.

Step 3:  Save as 24-bit uncompressed bitmap file (lots of 0s).

Step 4:  Check file size, increase canvas, and re-save until file size approached 5 MB.

Step 5:  Write "sorry" on the bitmap (it doesn't affect the entropy much).

Step 6:  Rename the file from untitled.bmp to Britney Spears - Hit Me Baby One More Time.mp3.

Step 7:  Upload the file to the FTP server over the 56k modem at >300 kbps (!!!!).

Step 8:  Quickly download files (at normal speed) from FTP before the owner finds your corrupted MP3 and boots you.

I learned the modem's simple compression was able to take packets of "0" and say "30 0" instead of "000000000000000000000000000000" and upload my payload at a fantastic speed.  If you were the owner of any of these FTPs, I hope you found the BMP header data and my embedded apology.  I'm sure this type of entropy attack could be adapted to be used effectively in modern DDoS, network exploit, and fuzzing attacks.

Real World Example #2 - Hypothetical Example

Utilizing high data entropy to protect Internet privacy.

We now live in a world of almost total surveillance.  As individuals, most of the gigabytes of data that typically travels in and out of our broadband routers (streaming videos, music, app downloads, etc.) is quickly indexed and the redundant data is discarded by the man-in-the-middle.

The bulk of the data that we send and receive that is personal, unique, or creative is relatively small, unencrypted, and easily stored for processing.  Smaller still is the average user's encrypted traffic that can, and is, collected, sorted, filtered, and stored indefinitely.

Encrypted traffic is difficult to identify specifically using Deep Packet Inspection (DPI) methods.  Instead all unknown, high-entropy traffic is interpreted as being encrypted data and is all collected and saved for processing and decoding at a later time.  This is something we can work with...

Raising the Signal-to-Noise Ratio

If there were a peer-to-peer network that did nothing but send a stream of meaningless high-entropy data to participating nodes on the Internet, the storage capacity requirements of those who would hold all of our private communications would need to be dramatically increased.

This also breaks the web of association that those watching us like to draw between individuals as it appears that we are all always connecting to one another through small intermittent encrypted channels.  The data could not be simply ignored or discarded, because some users can still embed real encrypted messages in the data stream amid the overwhelming noise.  While these encrypted communications might still be deciphered, the job of identifying encrypted traffic interlaced within a high-entropy data stream just became a painstaking, manual process, prone to false positives and wasted resources, perhaps to the point of mass surveillance becoming a financially inviable endeavor.

Shoutout to Crunchman, Dublin 2600, and the TOG Hackerspace crew.

Return to $2600 Index