Using "DeepChecksum" to Ensure the Integrity of Backups

by 75ce8d3ff802ff42

The U.S. Department of Defense describes five pillars of information security:

When dealing with messaging systems, we take great pains to use platforms and services that provide all five pillars, but many of us (including me until recently) don't take "integrity" into account when dealing with data backups.  If you're like me, you worked hard at making sure that no one could read your backups, but didn't give much thought to someone adding invalid data to your backups.  After all, why would someone want to plant malicious data into a backup set full of files that they can't even read?  Several reasons:

  • Adding illegal content to a backup could get you in a whole world of legal trouble (depending upon your jurisdiction).
  • Adding content deemed "inappropriate" by your local culture/family/circle-of-friends could cause significant loss-of-face.
  • If your backup contains executable software, a malicious actor could overwrite an executable (that they cannot read) with malware that your machine will execute when the backup is restored.

Now that we've established that backup integrity is important, let's talk about how you can protect yourself with a cryptographic record of what files were included in the backup.

Note:  If you're using an automated backup system or backing up to an encrypted drive (which you should be doing) then this system might be unnecessary/overkill/redundant.  If that's the case, consider this a safety-net/airbag/extra-protection/fun-exercise.

The goal of this mini-project (which I call "DeepChecksum") is to create a text file containing the cryptographic hashes of the entire directory tree of the directory being backed up.  We can then store that file somewhere safe or (better yet) sign it with your PGP key.  Fortunately there's a Linux utility for this!

Enter: Hashdeep (originally called "md5deep").

From the man page:

"Hashdeep ... computes multiple hashes, or message digests, for any number of files while optionally recursively digging through the directory structure"

That's exactly what we want!

Note:  Hashdeep may be called "md5deep" in your distribution's repository.

The Hashdeep package should contain several executables that do the same operations but with different hashing algorithms (sha256deep, whirlpooldeep, etc).  Use whichever version you prefer.  Read the man pages for syntax details.

Being a 2600 reader, I prefer to automate whatever I can, so I created a simple fish function to automate the entire process for me.  (Fish is an alternative shell that I prefer to Bash.  Translating this script to Bash is left as an exercise for the student.)

Just name this file deepchecksum.fish and drop it in ~/config/fish/functions/ to make the deepchecksum command available anywhere.

function deepchecksum --description="Uses hashdeep tools to create checksums of the current directory"
  set DATE (date +%F_%A)
  set BASEDIR (basename $PWD)
  set BASEDIR (string replace -a ' ' _ "$BASEDIR")
  set HASH_FUNCTIONS md5deep sha1deep sha256deep tigerdeep whirlpooldeep
  set SIG_DIR Signatures_for_{$BASEDIR}

  mkdir -p $SIG_DIR

  for HASH in $HASH_FUNCTIONS
    "$HASH" -rl . > "$SIG_DIR"/"$BASEDIR"_"$HASH"_"$DATE"
  end
end

Now if you're in the directory: /home/1337haxor/Documents/Hacking_stuff/

And run deepchecksum, you'll get the directory: /home/1337haxor/Documents/Hacking_stuff/Signatures_for_Hacking_stuff/

With text files containing hashes for all files in Hacking_stuff hashed with MD5, SHA-1, SHA256, Tiger, and Whirlpool.

The files have their date of creation in the file names, and, as an added bonus, each file contains the hashes of all hash files created before it.

Just sign these files and compare them to later runs of deepchecksum using diff to detect any modifications!

Happy hacking!  Stay safe out there.

Code: deepchecksum.fish

Return to $2600 Index