Steganographic Filesystems

by Chimera Manicore

People are worrying a lot about data privacy and security these days.  The use of strong encryption is one popular method to ensure private data stays that way.

Encryption is not always the best way to protect data.  One can imagine many scenarios where the existence of encrypted files might be considered compromising by a third-party.

According to some, the level of plausible deniability is diminished when you encrypt a file - after all, why would one bother unless one has something to hide?  Or at least so the argument goes.  Grand Inquisitor Torquemada would likely be unimpressed with your assurances that the encrypted file is really a picture of your grandmother, and so might coerce you to decrypt the file.  A plain old subpoena might have a similar effect.

This is where steganography comes in.  Steganography refers to the art of hiding in plain sight, which in this context is the practice of concealing private data of some sort inside public data of some other sort, thereby evading the attention of snooping eyes.  For example, a message could be hidden inside an image, or an image could be hidden inside an audio clip, or encoded in a blog or broadcast.  The trick is to make the envelope appear as normal and innocuous as possible.

    ------>    

It turns out that's a nontrivial task.  Regardless of the types of envelope and payloads one chooses, the fact that we're embedding a file within another will introduce anomalies that can be detected by an analyst.  In other words, we have a cat and mouse game between steganographers and steganalysts similar to that between cryptographers and cryptanalysts.  The moves of that game are often very complex and beyond the scope of this article, but suffice it to say the strongest methods often result in relatively low bandwidth in terms of letter-to-envelope size ratios.  That in itself can be a problem.  After all, how many pictures of your grandmother can you have without arousing suspicion?  And why do all your Bach cantatas sound a bit off?  Another similarity with cryptography is that if the analyst figures out how to extract the message from the envelope, there's no longer a question of plausible deniability.

The game is over.

Encryption and steganography used in isolation are indirect proof that someone has tried to hide something.  Worse, if somebody coerces the legitimate owner to reveal the cryptographic key or a steganographic algorithm, the cat is out of the box.  This is the core problem that a steganographic file system addresses .

The basic idea is to embed encrypted data inside a very large volume of random data in such a way that it's impossible to determine what data (if indeed any!) has been embedded.  In addition, it should provide a method to extract individual pieces of embedded data without revealing if there is additional embedded data.  Since random data is by definition random, the concept of an anomaly evaporates, and an analyst has no reference point from where to begin to unravel the mystery.  The analyst would see a large amount of random data, but won't be able to determine what if anything at all has been embedded.  When confronted with waterboards, rubber hoses, or court orders, the legitimate owner of the data can choose to reveal some of it, while keeping the balance secret and yet maintain plausible deniability.  For example, when coerced as described above, the owner could choose to reveal the nuclear launch codes while still keeping the video of that embarrassing karaoke night at the Singapore Metropole from the public eye.

By now the reader is likely thinking "I really need one of those!"  Fair enough.  Let's build a proof-of-concept steganographic file system from scratch.

The first thing we need is a very large volume of random data.  In a production strength system, this would likely be an entire disk partition filled with random data, but for the purposes of this article a large file will do.  On a Linux system, you can very easily create a large file of random data using the dd command.  At your option, you can also just as easily overwrite your system with random data and turn it into a fancy doorstop which might not be what you desired, so the preferred method is to create the file on removable media such as a USB drive.

Assuming the USB drive is mounted as /media/x/y, you simply run:

$ dd if=/dev/urandom of=/media/x/y/bigfile bs=1M

When the command completes, you have a huge file of random data taking up all the free space on the USB drive.  We are now ready to direct our attention to Listing 1.  The entry point to the program is indicated around line 86.  The first thing we need to do is calculate the number of blocks we have available using the actual size of the file and a fixed block size.

We also indicate the file we want to embed and a secret key associated with that file.  The next step is to add the file we want to embed to the file system.

The algorithm for doing that is implemented in the function writefsys().  Here's a high level summary:

First, we calculate the ID of the starting block by calculating the SHA-256 hash of the secret key and the file name modulo the number of available blocks.  We then encrypt the first block of data and write it to the block.  Our encryption algorithm is a simple-as-spit XOR operation with the key.  In a production grade system, we'd more likely use something like AES.  We then calculate the ID of the next block by calculating the SHA-256 hash of the previous block, and then encrypting and writing the data.  And so on with the next block, and the next.  The net effect of this is that the encrypted data is randomly distributed over the entire file system, and unless one knows the secret key and the file name, there's no reasonable way to collect it.  We can embed multiple files in the same file system simply by calling writefsys() again with a different file name and secret key, the only requirement being that the combination of filename and key are unique.

When we want to retrieve the file, we simply run the process in reverse, using the function readfsys().  We calculate the ID of the starting block using the key and filename just like before.  Then we read that block and decrypt it.  Like before, the hash of the SHA-256 hash will give us the next ID and so on.  The uniqueness of secret key and file name guarantees that the extraction of one file does not reveal any of the other files or even show their existence.

The proof-of-concept can only store individual files, but it's trivial to extend it to support other common features of file systems such as directory trees, symbolic links, and inodes.

The demo code has one problem that has to do with the nature of hashing.  Multiple entries can have the same hash value, and this would result in collisions where later entries will overwrite data already stored.  Similar to a birthday paradox, this is more likely than one would naively guess.  A solution to this issue is to store each encrypted datum in multiple blocks along with a validity checksum using several different hash values. When extracting files, we would only consider blocks that have a valid checksum.  This scheme will naturally reduce the number of files that can be stored, but will protect the integrity of the data.  The demo code does not have this feature for reasons of clarity.

Although this is an emerging field, there are currently several public implementations of steganographic file systems.

A popular choice is StegFS.

This is a user-space file system for Linux based on FUSE written in C and offers the robustness and performance that our demo's 100 lines of Python can't hope to match .  (See github.com/albinoloverats/stegfs for details.)  Another interesting architecture is Mnemosyne, a peer-to-peer distributed steganographic file system.  (The white paper is at www.cl.cam.ac.uk/research/srg/netos/papers/2002-mnemosyne-iptps.pdf.)

Shouts to John and Kirk.

Heloise,
Could I have imagined that a letter not written to yourself could have
fallen into your hands, I had been more cautious not to have inserted
anything in it which might a waken the memory of our past misfortunes. I
described with boldness the series of my disgraces to a friend, in order
to make him less sensible of the loss he had sustained. If by this well
meaning artifice I have disturbed you, I purpose here to dry up those
tears which the sad description occasioned you to shed: I intend to mix
my grief with yours, and pour out my heart before you; in short, to lay
open be fore your eyes all my trouble, and the secrets of my soul, which
my vanity has hitherto made me conceal from the rest of the world, and
which you now force from me, in spite of my resolutions to the contrary.
import os
from hashlib import sha256

# Calculate the block id and hash
def calcblock(bts, numblocks):
    hsh = sha256(bts)
    block = int.from_bytes(hsh.digest(), byteorder='little', signed = False) % numblocks
    return block, bytes(hsh.hexdigest(), "utf-8")

# does what the name says
def encryptdata(value, pwd, blocksize):
    # SHA-256 hash of the password will always be 32 bytes
    pwsh = sha256(bytes(pwd, "utf-8")).digest()
    if (len(value) < blocksize):
        # If too short pad with spaces
        val = value.decode("utf-8").ljust(blocksize)
        value = bytes(val, "utf-8")
    return bytes(a ^ b for a, b in zip(value, pwsh))

# does what the name says
def decryptdata(value, pwd, blocksize):
    # SHA-256 hash of the password will always be 32 bytes
    pwsh = sha256(bytes(pwd, "utf-8")).digest()
    if (len(value) < blocksize):
        # If too short pad with spaces
        val = value.decode("utf-8").ljust(blocksize)
        value = bytes(val, "utf-8")
    try:
        val = bytes(a ^ b for a, b in zip(value, pwsh)).decode("utf-8")
    except:
        val = "End of message"
    return val

def writefsys(fsys, fname, pwd, blocksize, numblocks):
    # Calculate the first block as the hash of filename+passphrase
    block, hshval = calcblock(bytes(pwd + fname, 'utf-8'),numblocks)
    outf = open(fsys, "r+b")
    with open(fname, "rb") as inf:
        while True:
            value = inf.read(blocksize)
            if value == b'':
                break # end of file
            #print("Writing to block " + str(block)
            #print(value.decode("utf-8"))
            byts = encryptdata(value, pwd, blocksize)
            outf.seek(block * blocksize)
            outf.write(byts)
            # the next block is based on hash of this block's hash 
            block, hshval = calcblock(hshval, numblocks)
    # This is just a bespoke end of file marker
    value = bytes("End of message".ljust(blocksize), "utf-8")
    #print("Writing to block " + str(block))
    #print (value.decode("utf-8"))
    byts = encryptdata(value, pwd, blocksize)
    outf.write(byts)
    inf.close()
    outf.close()

def  readfsys(fsys, fname, pwd, blocksize, numblocks):
    value = ""
    rc = ""
    # Calculate the first block as the hash of filename+passphrase
    block, hshval = calcblock(bytes(pwd + fname,'utf-8'), numblocks)
    with open(fsys, "rb") as inf:
        while True:
            #print ("Reading from block " + str(block))
            inf.seek(block * blocksize)
            binarydata = inf.read(blocksize)
            value = decryptdata(binarydata, pwd, blocksize)
            if value.startswith("End of message"):
                break
            rc += value
            # the next block is based on hash of this block's hash
            block, hshval = calcblock(hshval, numblocks)
    inf.close()
    return rc

# Entry point to program
if __name__ == '__main__':
    # The file which contains the file system
    fsys = "/media/x/y/bigfile"
    # A file with the message that must remain secret
    fname = "./secretmessage.txt"
    # A secret passphrase
    pwd = "To Heloise"
    # The blocksize we're using (bytes)
    bsz = 32
    # Size of the fsys file
    fsz = os.path.getsize(fsys)
    # number of blocks in the file system
    blknum = int(fsz/bsz) - 1
    # Write the file contents to the filesystem
    writefsys(fsys, fname, pwd, bsz, blknum)
    # Read it back
    msg = readfsys(fsys, fname, pwd, bsz, blknum)
    # Display the message
    print(msg)

Code: steganographic-filesystem.py

Return to $2600 Index