MP3 Data Stream as a Covert Means of Distributing Information

by enferex (mattdavis9@gmail.com) / A 757 Labs Effor (www.757labs.com)

Introduction

One of the great things about the collective brain of the Internet is the amount of information that can be exchanged as events occur in the tangible world.

Like-wise, new music and other audio tracks can be consumed, furthering the expansion of musical interests and introducing new ideas to the masses.  Whether through Internet radio or downloadable tracks, the ability to disseminate information has become relatively common.

Just looking at the number of podcasts floating around, one can see the variety of information being spread.  The MP3 format has been a relative commonality for streaming media as its underlying structure nicely supports such means of data transfer.  However, one can leverage the properties of this format to transmit data that is not heard but can still be extracted.  This article discusses hiding and transmitting information within an MP3 file that can be later streamed or downloaded.

Frames

A MP3 file is nothing more than a series of frames.

Each frame consists of a header and appended audio data.  This small header, four bytes, provides information, such as bit and sampling rates, which describes the audio data that follows.  This header allows an audio player to appropriately reproduce the correct sounds.  By applying proper mathematical calculations to the data in the frame's header, the length of the audio data can also be determined.¹

A MP3 file can be made up of thousands of these frames, which is the primary reason why streaming MP3 audio works, or why partially downloaded MP3 files can still be played.  Since each audio frame has a header with a special signature, the audio player looks for that special pattern signifying the start of an audio block.  This pattern, a series of 11 set bits, is called the "sync frame," and each frame in the data stream contains it.

Once a frame header is obtained, the length of the audio data trailing that frame can be determined.  The audio player then grabs that calculated amount of data and processes it appropriately.  Any data outside of the frame can be ignored.  As a side note, the audio data is encoded and decoded via the Huffman encoding scheme.^{2, 3}

Hiding Information

Since audio players are only concerned with replaying audio data, anything outside of the frame is ignored.

This is merely insurance for protecting the aural pleasure of the listener.  Thinking such out-of-band data is something that can be heard is agross assumption, and the result can be a rather despicable symphony of squeaks and squawks.  This means that, if an audio player is implemented correctly, any data that exists between frames should not be replayed.  Therefore, information can be hidden by placing it between audio frames.  While not truly a form of audio steganography, hiding information between frames is a quick and easy means to stash away data.  On the other hand, true forms of audio steganography rely on actually hiding information in the audio bits themselves.^{6, 7}

If someone is actively looking for such out-of-band data, it is easy to find.

For instance, someone might analyze the audio file, or stream, and compare the frame count, frame sizes, and ID3 information tags to the actual file size.  If the sizes do not correlateproperly, chances are that there is some extra data hiding underneath the covers.  Likewise, if the audio player tries to play all data, or if the out-of-band data has the same signature of an MP3 frame header, some rather obnoxious sounds might emerge.

As previously mentioned, audio players look for a signature that prefixes and describes a following block of data.

Such a signature begins with the first 11-bits all set.

Certain portions of the remaining 21-bits of the header can be used to validate that the frame and following data is audio.

For instance, if a particular bit sequence is defined that does not equate to a valid bit rate or sample rate, chances are that the data is out-of-band.  What would happen if one were intentionally hiding information between frame headers, and a segment of that to-be-hidden data contained the same bit-signature as a frame header?

Well, if the audio tool did not do the proper calculations on data in the header (e.g. bit rate/sample rate values), that block of data might be played as audio.  Such a case might also occur if, for some reason, the stars all align properly and the hidden data just happens to look like a valid MP3 header, sync bits and all.

Such cases can be avoided if the data never contains any pattern that looks like a MP3 sync frame.  So it is of importance that anyone trying to stuff data between frames not replicate such a signature.

One simple solution is to encode the data before hand in a manner that will not mimic a sync frame.  Such an encoding scheme should never produce a stream of 11-bits all set.

In fact, if one can avoid passing an entire byte with all-bits set, a sync frame would never appear.

Plain ole' ASCII text is a perfect example of such an encoding, as it only uses 7-bits of data to encode characters.⁴

The uuencode tool helps with this trick, transforming standard binary machine encoding into 7-bit ASCII encoding.⁵

It should be mentioned that 7-bit encoding of raw data will result in a file larger than the original.  It is not a compression technique.  However, the 7-bit encoded data produced can be compressed.

Tool: MP3nema

The MP3nema tool has been produced to aid in stuffing and extracting data between frames.

The original intent of this application was to analyze MP3s, both static and streamed, for out-of-band data.  However, testing such analysis required that a valid test case be created to assure detection.  In other words, we needed to inject data between frames so that we could verify that the tool was working properly.

After some time, the main focus of development shifted from data detection to actual data hiding and recovery, and now this tool can covertly pack data into a series of MP3s for distribution.

However, if someone desired to covertly distribute a movie, for example a completely legit HD-quality video, they probably would not want to stuff it all into a three-minute/3 MB audio file.  "Wow, this song is really boring; lots of large pauses."  In fact, for humor, assume one were distributing this perfectly legit movie using a perfectly non-legit audio file, a 4 GB movie would take quite a while to distribute, especially if it were encoded using uuencode, which increases the original file size.  Not to mention that the three-minute song would be of a curious size.

Conclusion

While the method of hiding data between frames, rather than in the audio itself, is less a testament to steganography, it is simple to do.

Such a method allows for data to be quickly extracted as the media is being played/streamed.

One potential use for this technology, however outlandish it might appear, could be to bypass firewalls that prevent access to outside email (e.g. streaming of uuencoded email in tracks of music).  Even cooler would be to associate email-senders to a particular musical artist and stream that data.  "Aww man... Sting again; this is great!  Ohh wait, it must be that chick Roxanne sending me emails about how I can improve my performance."

References

Bouvigne, Gabriel  Frame Header  MP3'Tech.  2001

MPEG  Coding of Moving Pictures and Associated Audio for Digital Storage Media at Up to About 1.5 Mbit/s - Part 3 Audio (Draft)  ISO/IEC.  November 22, 1991

MP3  MP3  Wikipedia.  2008

ASCII  ASCII  Wikipedia.  2008

Uuencoding  Uuencoding  Wikipedia.  2008

Fabian Petitcolas  MP3Stego  January 2008

Mark Noto  "MP3Stego: Hiding Text in MP3 Files"  SANS Institute.  2001

github.com/enferex/mp3nema

MP3nema  An MP3 analysis, data capturing, and data hiding utility.

mp3nema-0.4.tar.gz

Return to $2600 Index