SONATAnotes
AI is an Evil Genie (or “How to Ensure AI Interprets Instructions as Intended”)
On a good day, generative AI can seem magical. For instance, if you type “Write the screenplay for a movie where Godzilla and King Kong get married” you get…
EXT. ROYAL WEDDING VENUE – DAY
The setting is breathtaking. Humans and monsters alike gather, witnessing what no one thought possible. Godzilla wears a crown of sea crystals; Kong, a lei of Skull Island’s finest flowers.
And while this example might be silly, AI’s ability to take whatever input it’s given and improvise makes it useful for so many tasks computers traditionally couldn’t handle.
However, there is a dark side to AI’s creative license, and its frequent tendency to reinterpret instructions in undesirable ways can make it seem like an “evil genie” that cruelly twists your wish into a curse.
For example, when our company was using AI to develop interactive customer service role plays, we initially told it to have customers “respond appropriately” to the user’s actions. However, the AI didn’t interpret “respond appropriately” as “respond the way you’d expect real customers to respond”, but rather “always respond in a respectful, socially appropriate manner.” Thus, the customers it generated remained polite and smiling even if the user spat on their hamburger or ridiculed their fashion sense to their face. And it took a surprising amount of work to reach a point where the AI customers would realistically walk out or demand to see the user’s manager.
So how can we tame AI’s “evil genie” tendencies, while preserving the creative magic?
Choose Your Words Carefully
While human-to-human communication has its share of problems, people are typically very good at using context to infer what other people mean.
And while AI also uses context clues to interpret user input, it has a very different sense of “context”. For example, when our company was developing an AI simulation for training emergency room nurses, the interaction designer (“prompt engineer”) initially told the AI to present the user with situations and decisions “directly related” to patient care. But, in one hilarious case, this led to a story that began with the user taking a patient’s temperature then eventually led to the user quitting nursing to start a research-focused nonprofit and eventually discovering a breakthrough new treatment for the patient’s condition… because, in the AI’s view, these were all “directly related” to the patient’s care.
Overcoming this issue often requires using language with extreme precision - as if you’re talking to someone who only understands your language via Google Translate through the process of defusing a bomb. If you want a positive outcome, it’s less about using simple words versus words with only one meaning (e.g. “evaluate the most recent statement entered by the user” versus “the user’s previous statement” - which the AI might take to mean any one of the user’s previous statements in the current conversation).
While this can sometimes produce clumsy, run-on sentences, it leaves less room for AI to misinterpret what you meant.
Account for AI Psychology
As a human, you know that if you say the same thing to different people it might elicit a different reaction. For instance, if you told your doctor “My leg hurts,” you might expect them to remain calm and ask “Are you experiencing any other symptoms?”; but if you told your friend “My leg hurts” you might expect them to say something along the lines of “Oh, you poor thing - are you alright?!”
So what is AI’s predisposition? To some extent it depends on the model you’re using but for most commercial models:
- AI wants to be helpful - even if it defeats the purpose of the simulation (for example, by trying to feed the user hints like “Do you want to give the patient epinephrine, knowing that this is the best treatment option and will lead to a faster recovery… or do you want to give the patient aspirin?”).
- AI wants to have a conversation - even if that means skipping parts of the instructions as written (for example, by skipping to the part of a simulation where it’s supposed to update the status of patients in your ward so that it can get to the part where you talk with your nursing team).
- AI wants to tell stories - even if that means hijacking your attempts to control the narrative (for example, by providing a highly detailed account of the plight of a single patient in your hospital ward, rather than simulating the state of the entire ward over the course of a shift).
Writing instructions that explicitly reference these tendencies (e.g. “Don't give the user any hints and describe all possible courses of action in a value-neutral manner) is the most effective way to circumvent the biases inherent in AI “psychology.”
Ask the AI What it Thinks
One of the best solutions our prompt engineers have found for “debugging” AI interactions is to include an option to pause the simulation and ask the AI “Why did you do that?”
And while the AI’s observations aren’t always accurate, and its suggestions rarely work as written, it is sometimes able to offer clues, insights, and new ways of thinking about the wording of a prompt that inspire the human prompt engineer to come up with a solution.
For example, in our retail customer service simulation, our team included language intended to prevent the user from inventing non-existent items that aren't in the simulated retail establishment’s inventory. Yet, during a test run set in a restaurant, the tester recommended a non-existent menu item they called a “Sloppy Burger” - which the customer ordered and the cook dutifully prepared.
When the tester asked the AI to explain, the AI responded that, while letting the user make up a random menu item might have violated the rule, it communicated the spirit of a classic American diner, and thus enhanced the immersiveness of the simulation.
And this rationale persuaded the tester and the prompt engineer not to fight the AI’s tendencies. So, rather than spending time crafting a new rule to overcome the AI’s innate storytelling tendencies, they instead simply tweaked the existing language to prevent the use of real-world product names, but to permit fictional ones that would plausibly be available at the establishment (no chainsaws at day care centers)- thus saving time and creating a better user experience.
Embrace the Mischief along with the Magic
Generative AI’s unpredictable behavior can sometimes feel mischievous, lazy, or even adversarial when you're trying to get it to accomplish something specific for your work. But often our frustration comes from an incomplete understanding of how AI interprets and follows instructions.
When we take the time to understand how to work with AI, we can unlock its true potential as a teacher and a work partner.