Hacking Your Mother Tongue to Obfuscate Your Encryption

by Israel

When most of us in this day and age think about encryption, we think of complicated mathematical algorithms to hide data.  When we think of breaking decryption, most would probably think of brute-force programs and clusters of high-powered computers.  This was not always the case until the computer generation came along, as encryption dates back to the time of Caesar.  In the not-so-distant past, people broke encryption with nothing but pen, paper, and their heads.

In the case of the English language, there are hints that may allow someone breaking encryption to get an advantage.  For older or simpler encryption, the first thing you would be looking for is the character that occurs the most.  This would be the letter e. The letter e occurs more frequently than any other letter in the entire English language and it's very easy to see why.  Take into consideration the following words:

  1. Me
  2. Meet
  3. Met
  4. Close

Here are four common examples of how the letter e is usually used.

In Example 1, there is the hard e sound.  In Example 2, there is the hard e sound again, but used with double ee.  In Example 3, we have the soft e sound.  In Example 4, the e is silent and does not make any sound.  The letter e kind of runs rampant in English when you really think about it.

Before I proceed, please do not feel intimidated by what I'm about to suggest.

I am not asking you to learn Russian fluently, nor any other language.  Sadly, I myself am not fluent in anything besides English.  We are merely going to talk about some of the concepts of Russian as examples to use in obfuscation.  Did you know that in Russian schools there are no spelling classes for any of the grades?  Imagine what we could have learned during the time wasted in an hour of spelling everyday.

The reason behind this is that in Russian, everything sounds exactly as it is spelled.  The Russian alphabet uses 33 characters, whereas the English alphabet uses 26.  Quick searches online can show you how their alphabet can easily translate English words.  With one site, I found I was able to start using their symbols to read English words in about 20 minutes.  However, the sentence structure of Russian is very different than English.

For example, this sentence in English: This is a very old table.

Would translate to the following in Russian: This old table.

If we combined the English sentence structure with the characters of Russian, we can add a level of obscurity to anyone trying to break our code.  If a cracker was able to deduce we were using a Russian alphabet, they would most likely assume we would be speaking Russian as well.  However, let's try to take this further.

So if we were to take the phrase This is the message and translate it to Russian we would end up with ЭТО СООбЩЕНИЕ which literally says This message.

However, this is not what we get when translating the English letters back to Cyrillic.  Instead we end up with ТХИС ИС ТЗ МЕССАГЕ.  This phrase roughly translates back to This study the message.  Where did study come from?  This is what is known as getting lost in translation.  From the standpoint of obfuscation, this can be an advantage.

Also, note how these phrases all look totally different from each other:

ЭТО СООбЩЕНИЕ (Real Russian)

ТХИС ИС ТЗ МЕССАГЕ (English with Cyrillic)

This is the message (English Plaintext)

We have taken what would be two words in Russian and made them four.

The number of words would most likely be irrelevant as many encryption schemes will leave no empty spaces between characters.  Yet the real Russian phrase was 12 characters.  Our obfuscated phrase came to have 15 and the original phrase This is the message has 16.  While this is not a lot of difference, you can see how over a long amount of text this would greatly differ from the English or Russian versions of the plaintext.

In Russian, there are other characters we can use such as the symbols Ы and Ь.  These denote if the hard sound or soft sound is going to be used with the letter following them.  So ЫА would be the hard a sound and ЬА would be the soft a.

If we represented these two sounds as numbers, we would most likely have them as two completely unique numbers and grow our alphabet even further.  In order to do this with mathematical algorithms later we would probably be changing any characters into numbers anyway for computation.  Essentially, we could make each unique phonetic sound represented by its own number.  We could also change this by using one number to represent double-constants such as the Fr in Frank or the rk in Mark.  This simultaneous inflation and deflation of the number of characters used would add more complexity as well.

I leave this as an exercise to the reader to create their own language hybrids.  Imagine something like the Chinese characters where whole words may be represented as one character.  I would love to see this added to something like the Spanish sentence structure where the verb of the sentence comes first.  You may even think of much better anomalies than I did!

After we have encoded all of our newly plain-text into our Cyrillic obfuscated text and then to numbers, we can proceed with real algorithms and modern cryptography.  I would like to show how this can be applied to modern cryptography, but the encryption laws in my country are rather strict when it comes to out of the country exportation.

On the other side of the coin, readers in some parts of the world have very strict laws on the importation of encryption as well.  While I see punishment for sharing what I have created for myself a gross violation of free speech, I do not wish to endanger others because of my protest without their consent.  I would rather take this time to encourage people across the world to speak your voice and demand freedom to express and share your own ideas.  I fear that one day soon, encrypted text may be the only freedom of speech or right to privacy we have left.

For those who may be in doubt of the effectiveness of this, let's observe history.  During the Second World War, the United States did not use encryption in the usual sense.  They transmitted their communications using the Navajo Indians' native language.  The Germans intercepted these communications and assumed this was English that had been encoded.  Code crackers worked hard for a decryption key that would never be found because it didn't exist.  This illustrates the point that language is powerful!  It can change minds and can even win wars.

In closing, all English speakers may not be familiar with the phrase "mother tongue" in the title.  This simply means the first language you learned.  I only know this phrase due to someone running a scam that kept spamming my work.

May they live long and prosper.