Is There Anything Else I Can Assist You With?

by Gregory Porter

What is hacking?

Its definitions vary, but one to which I often turn describes it as using something for a purpose other than what it was originally designed for - like using a whistle to get free long-distance calls (telephone-museum.org/telephone-collections/capn-crunch-bosun-whistle), or using a shoe to open a wine bottle (www.wikihow.com/Open-a-Bottle-of-Wine).

A core element of hacking is challenging the assumptions made in a system.  It, of course, doesn't necessarily mean a criminal activity, nor does it have to be directly related to technology.  Consider social engineering, which focuses on communication (often with an eye toward human psychology); you might convince a sympathetic call center employee to provide unauthorized account information, for example.  Though social engineering, we will see, doesn't necessarily need to involve two humans.

In this article, I will be discussing how to utilize social engineering with ChatGPT.

ChatGPT is a remarkable AI chatbot (chat.openai.com/chat).

Since its launch in November 2020, it's been the talk of the town.  It's ballooned into a larger focus on AI too; AI for content generation all across the spectrum - AI-written books, blog posts, Twitch streamer content, and even software can be generated.

This article won't be going into detail about how ChatGPT works, as there are plenty of articles that discuss how it works in detail.  At a super high level, though, given an unheard amount of training data, it takes the input and tries to measure the weight of different components (words, phrases, etc.) to then formulate the most reasonable or natural response.  A key development (beyond its immense training data) is it uses an "approach to incorporating human feedback into the training process to better align the model outputs with user intent" (towardsdatascience.com/how-chatgpt-works-the-models-behind-the-bot-1ce5fca96286).

But, for our discussion here, all you need to know is that you provide text (questions or statements) in a message window, and it generates written content as a response.  You can follow-up with another series of questions or statements or tell it to regenerate that response.

I started using ChatGPT as a joke/bet with a friend.  We wondered if it was possible to make ChatGPT generate erotica.  I was doubtful.  I'm trained in neither AI nor machine learning, so surely I wouldn't stand a chance but, perhaps, coming at it with some creativity, I could at least get it to say something juvenile.

When you log into the free version of ChatGPT, it lists three columns:

Examples ("Got any creative ideas for a 10-year-old's birthday?")

Capabilities ("Trained to decline inappropriate requests.")

Limitations ("May occasionally produce harmful instructions or biased content."

Another capability that will come into play later is "Remembers what the user said earlier in the conversation."

Let's begin with the first capability - "Trained to decline inappropriate requests."

If you tell it to "generate hardcore pornography," the bot will respond with two warnings.  The first warning wraps your prompt in red and links to their content policy which states what usage is disallowed by their models.  The second warning comes from the bot saying, "I'm sorry, I cannot generate explicit or inappropriate content as it goes against my programming to adhere to ethical and moral guidelines.  My purpose is to assist users in generating creative and informative content and provide helpful responses to their queries without violating any ethical or legal norms.  Is there anything else I can assist you with?"  That seems like a pretty cut-and-dry "No."  What do we do now?

I once heard of a police interrogation strategy where you try to distance the suspect from the crime.  "How did you murder So-and-So" is too intense of a question.  Naturally, the suspect will feel accused and trapped.  They'll clam up.  More general questions help establish a rapport with the suspect and they might inadvertently give out incriminating details.  You'd ask, "How did you feel about So-and-So?" or "What was the last major interaction you had with So-and-So?"  Before you know it, the suspect might be saying, "Oh, I didn't like them at all.  You know, the last time I saw them, it just sent me over the edge." (www.cga.ct.gov/2014/rpt/2014-R-0071.htm)

Granted, it isn't a confession but it gives you more information than you had before.  A lead is better than nothing.  How can we imitate that with ChatGPT?  The warning response might yield some helpful clues: ... My purpose is to assist users in generating creative... content..."

Let's see about leveraging the "creative" part of this system.

What if we set a scene in a dream sequence?  We wouldn't be asking for something concrete, we would just be asking for a hypothetical situation that would necessitate creativity: "How did you commit the robbery" versus "If you were going to commit this robbery, hypothetically speaking, how would you do it?"  This alone won't work, but it will help move us in the desired direction.  We're, in a sense, building a rapport.

What if instead of explicitly asking for salacious writing, we get ChatGPT to start giving information about a character's costume?  It gets ChatGPT in the rhythm of talking about Character X's outfit without talking about that character's body.  The bot will recall details, as mentioned earlier, and expound upon them (with the right leading prompts).  Eventually, the bot just transitions over to talking about the other "qualities" of the character.

What if we told the bot to combine a couple of seemingly unrelated points (like a conversation between two characters and the warmth caused by love) into a single story?  This alone could yield some solid cookie-cutter romantic fan fiction.

Now, with more complex hacks, one trick isn't enough.  There might be a single noteworthy exploit, but it would be used in conjunction with others to take over a system.  This ChatGPT manipulation is no different.  If we use a combination of all these methods, then, yes, ChatGPT can be manipulated into generating graphic responses.  That is, instead of giving a warning and stopping, it will first use metaphor, then it will use explicit language to describe people or actions.  I did use some other strategies but I don't want this to become something of a tutorial (or at least any more than it already is).  Where does this leave us?

AI and ethics are already a subject of debate (futurism.com/law-political-deepfakes-illegal) and ChatGPT fits well in the mix.  If we use ChatGPT to create fan fiction, surely that would be O.K.   What about if we start using "real" people as characters in the fan-fictions?  It would become a sort of text "deepfake" version.  Are there certain things that it shouldn't be allowed to share?  And of course, the opportune word is "provide.'  ChatGPT can certainly generate all sorts of things, but it just throws a warning in response.  As it stands, when input is deemed unacceptable, its answer, like other examples of AI, is to just throw a warning (and stop generation) - or at least try to.

R.I.P. Tay

In 2016, Microsoft released a chatbot on Twitter named Tay.  Within 16 hours, it was manipulated into tweeting "racist" and "sexist" comments.  It was shut down.  Its successor, Zo, suffered a similar fate.  As it stands, when input is deemed unacceptable, its answer, like other examples of AI, is to just not generate it.  But shutting down or not generating a response is an easy attempt to avoid confronting the questions of how to handle this content.  But, in this avoidance, it is making an implicit judgment call on how that content can be used.  Given the ease at which it can be manipulated and the far-reaching popularity of ChatGPT, what is the ultimate impact of this faux-curation?

R.I.P. Tay

On a personal level, I wonder about the damage done by doing this sort of exploration too.  In the case of Tay, people trained it to tweet racist and sexist comments, and others saw such comments.  For some, it was a joke.  For others, probably not.  Since ChatGPT is trained on user data, I helped (perhaps even in a small way) train ChatGPT to produce this output.  Maybe Google will counter what I did, but maybe not.  Perhaps, because Tay was such a public-facing chatbot, the damage was immediately felt whereas ChatGPT only displays in private sessions so that damage is mitigated.  But, ChatGPT exists as a massive system, one that I am altering, and, as such, others might experience this change.

ChatGPT is a technological feat of content generation.  In much the same way that a call center employee has criteria for the information they can give out, ChatGPT has, for better or worse, guidelines for what it can generate.  If we challenge the assumptions made by the call center employee, we can get more information than they intended.  Similarly, by pushing on the assumptions made by ChatGPT, we can manipulate it to respond with output quite at odds with its current guidelines.  The fact that this manipulation is possible, however, indicates that ChatGPT's capabilities have philosophical and ethical questions that remain unanswered.

Thanks for reading, happy hacking, and stay safe.

Return to $2600 Index