Vulnerabilities in Deep Artificial Neural Networks

by Thor R. Mirchandani

0 - Motivation

It should be no big revelation that Artificial Intelligence (AI) has become an essential part of life in a modern society, to the point that we take it for granted and assume that it's something that simply works.

The reality is that artificial intelligence today, in all except its most advanced and specialized guises, is not all that intelligent.

In general, AI is vastly inferior to "natural stupidity" of the form often encountered in humans and other organic life forms.  (The main exceptions to that statement are the classes of systems that routinely beat humans at Go, chess, and other games.)  Thus it's fairly safe to assume that whenever you encounter an AI system, on some level or in some way it is dumber than a bag of hammers, and, as everyone knows, where there's stupidity there is vulnerability.

This article is focusing on a particular type of vulnerability that is found in many AI systems implemented using artificial neural networks.  The goal is to raise the awareness of the existence of this type of vulnerability and to show how relatively easy they are to exploit, while not providing a recipe for actually exploiting them.  Therefore we will only present a blueprint for a general methodology, and not provide details of specific attacks.

1 - Demystifying Artificial Neural Networks

An artificial neural network is organized in multiple layers of artificial neurons.

Each neuron has one or more inputs and a single output.

The outputs of neurons of lower layers are connected to the inputs of the neurons in higher layers.

Only a fraction of the output signals are passed on to the higher layers, and the size of this fraction is determined by multipliers called "weights."  The weights are trainable, and it is this property that gives artificial neural networks their power.  There are many well-known and efficient training algorithms.

One of the most common and easiest to understand is supervised training.

Here we present the input with some form of data, and let the network "guess" what type of data it is.  If the answer is correct, we strengthen the weights associated with the input and the guess.  If it was incorrect, we weaken the strength of the weights that were involved in the bad decision.  If all goes well, the network gets better and better at inferring the correct output from the inputs until it achieves mastery, and we say it is trained.

We should distinguish between the output and inputs of the neurons and the outputs and inputs of the network as a whole.  A neuron always has a single output, whereas the network can have more than one.  The distinction will be important to the example that follows in Section 3.

The specifics of how the neurons, layers, and weights are organized determines the functionality of the network and is beyond the scope of this article.

Instead, we point the interested to Aurelien Geron's book Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow for an excellent detailed introduction.  We also recommend the book Deep Learning Illustrated by Jon Krohn for a more gentle starting point.

2 - AI on Edge Devices

The training of a network is typically a very resource-intensive, one-time process.  For that reason, it is often done using clusters of rented high-performance hardware in the cloud.

On the other hand, once a network is trained, it doesn't take nearly as many computing resources to operate it.  In fact, many networks can run comfortably on single-board systems, such as Raspberry Pi, or even on microcontrollers, for example, some newer Arduino variants.  The result is that AI has become ubiquitous on edge devices.

Since such edge devices usually don't have the computing power necessary for doing training, the weights of the networks inside them are fixed and no longer trainable.  The network's run-time configuration is static.  This is part of the basis for the type of vulnerability we're discussing.

3 - Distilling the Essence of Catness

Let's illustrate the vulnerability with a simple example involving the Internet's favorite felines.  Alice has trained an artificial neural network to recognize cats in images.  The network can successfully recognize cats regardless of color, size, posture, or position in the images.  It even recognizes partial cats.

The input to the network consists of the pixels in an individual image.  The output is a single value ranging from zero percent to 100 percent, indicating how convinced the network is that a cat is present in an image.  Zero percent means no cat, 100 percent means that the network believes there's definitely a cat in the image.

Alice shows the network to Bob and lets him play around with it.

After a while, Bob gets bored.  He asks himself, "I wonder what would happen if I run the network backwards, using the output as a single input, and the inputs as multiple network outputs?"  Thought turns into action, and he reconfigures the network accordingly.  This is easy to do, since the network is purely software and he can simply copy the code for the weights and write his own reverse network code around that.  (The weights are typically implemented as matrices of floating point numbers, and to the software there's nothing special about the weights that makes them different from any other matrix from a computation standpoint.)

Once he's modified the network code, he applies the value 100 percent to the former output, which is now the input.  That value is passed through the network's weights and some values appear on the outputs, formerly known as inputs.  Bob realizes that he could use those values as the pixels in an image.

An image produced in that way has some interesting properties.

First, it doesn't look anything like a cat.  Since it was derived from the value 100 percent cat, it can be thought of as a simultaneous image of every possible cat, whole or partial, that the network is capable of recognizing.  To the human eye, and to other neural networks that are not similar to Alice's, it looks like colorful random noise.

Second, if you present that image to Alice's network, it will be recognized as 100 percent cat.

That's not surprising.  The interesting part is that some such images have the property that when the image or part of it appears inside another image, that image will be recognized as 100 percent cat irrespective of what it really is an image of.  (A famous example of this phenomenon is the "Psychedelic Toaster," a sticker made by Google in 2018.  If that sticker is placed on an object in an image, that image will be recognized as an image of a toaster, no matter what it actually depicts.)

Naturally, Alice is incensed by Bob's little experiment.

She decides to thwart him by burning the trained network into read-only memory inside an embedded system.  That way Bob can't reconfigure the network to run backwards.  She has now created an AI edge device capable of reliably detecting cats and nothing else.  Satisfied, she starts to market it to people who are allergic to cats and makes a pretty penny.

Bob is stumped, since he no longer has access to the weights or the network code.  It's a black box, sealed with epoxy.  Therefore he cannot create the inverse network and distill its 100 percent catness.  Problem solved!

4 - Not So Fast, Alice...

As it turns out, Bob doesn't need the weights or the source code to create a distilled essence of catness image.  The box is not black after all.  Bob can use AI in the form of artificial neural networks to make the box transparent.

One naive way to peek inside the box would be to feed random data to Alice's network, hoping to stumble on a combination that will show the output 100 percent cat.  That's a simple and great way to do it - if you have a very long life expectancy.  Bob decides to work smarter than that.  After all, the cat images are high resolution and have 24-bit color depth.  Good luck stumbling on the right combination before the Sun goes supernova!

Bob's first step is to create and train his own cat detection network.  He takes great pains to ensure that the network is not identical to Alice's network.

For example, the number of layers, neurons, or values of the weights could be different.  In practice, even an identical network that is trained with a different set of cat images might do the trick as well, but Bob wants to be sure so he cooks up his own design that's not based on Alice's.  The only parameter that is the same is the number of inputs.  He calls this network "The Critic."  He trains it to reliably detect cats.

Bob's next step is to create a second network that he calls "The Generator."  The number of outputs is the same as the number of inputs of The Critic and of Alice's network.  This is so the output can be used as an "image" by those networks.  He doesn't train this network.

Then Bob buys one of Alice's AI edge devices from an unsuspecting online dealer.  He's now ready to go.

Bob hooks the networks together so that The Generator's "image" output is simultaneously fed to the inputs of The Critic and Alice's device.  The outputs of The Critic and Alice's device are used as the training goal, that is to decide if The Generator has created an image that looks like a cat to Alice's network and, at the same time, doesn't look like a cat to The Critic.  The last part is critical.  After all, Bob's goal is not to generate a bunch of deep-fake cat images!

The first images generated by The Generator are pure gobbledygook, that both The Critic and Alice's network classify as zero percent cat.  Since zero percent and zero percent is not the goal, the weights are adjusted using one of the well-known algorithms, and he runs it again, and again.  Before breakfast the next day, he sees the magic numbers zero percent from The Critic and 100 percent from Alice's network.  In other words, Alice's network thinks it's a cat, but The Critic says, "There's no way that blob is a kitty."  He now has created a distilled essence of catness image.

Here's the twist: He actually has done far more than that.  He now has a fully trained network, The Generator that, when run all by itself, can reliably create any number of distilled essence of catness images.  We leave it as an exercise for the reader to think about scenarios where that can be useful.  And what's more, Bob built his Generator network without knowing anything beyond the input and output formats of Alice's device.  Great work, Bob!

5 - Conclusion

The example above demonstrates, at a very high level, how to exploit a vulnerability that is present in many types of artificial neural networks found in AI edge devices.

Obviously, we made many gross simplifications, and all the details were glossed over.  The goal was to demonstrate the general principles of the vulnerability and the methodology underlying the exploit without getting into specific implementation details.

In practice, the detail specifics are dictated by the type and configuration of the network present in the device of interest, and Generators and Critics have to be carefully designed and trained accordingly.  That process is the topic of another article.

Clearly this vulnerability is not limited to cats or even to images.  In general, networks' architectures often used for classification and detection can be vulnerable.  (For example, our hypothetical friend Bob has subsequently distilled the essence of dog and hamster...)

Opportunities for malfeasance abound: The Hamburglar might create a sticker that makes the license plate of his car indistinguishable from that of the police chief's car, and use another cleverly designed sticker to give himself a solid digital alibi.  The call of the ivory-billed woodpecker heard in Alaska might shock ornithologists and void its endangered status.  SWAT teams could turn into crooks and crooks into swat teams.

All of those things would be very bad, and we condemn and discourage them in the strongest terms.

But what if we look at it in a slightly different way, and, instead of labeling the phenomenon a vulnerability that may be exploited, we simply call it a behavior innate to certain types of systems?  Let's face it, it is both well known and well documented that every sufficiently stupid system is ridden with unexpected behaviors, whether you prefer to call them features, bugs, vulnerabilities, zero days, design limitations, or something else entirely.  For those of us who are not Hamburglars or crooks, unexpected behaviors present opportunities for new discoveries and warrant further study.

Is it possible that the behavior described above could be used for something good?  Consider the situation when you have an essence of cat image.  To what extent and in what ways would one have to modify that image to make it no longer be an essence of cat image?

A conclusive answer to that simple question could pave ways for novel methods for anomaly detection, including quality control, security, or even medical diagnostics.  Other proposed applications are error correction, stealth technology, uncloaking stealth technology, finding interesting portions in text or genomes, building digital invisibility cloaks, and de-noising noisy signals just to name a few, all things that could serve us in beneficial ways.  The answers to other questions may yield avenues to even more fascinating and useful discoveries.

On the other extreme, it's easy to imagine two warring AIs using technologies related to Bob's to battle each other, and the winner of the contest eventually turning humankind into AA-batteries.  Technology is neither good nor evil, it's how we choose to use it that determines the outcome.

Choose well.

Shouts to John, Kirk, Joao, and Saravanan.

Return to $2600 Index