Welcome to Wonderland: Why Working With AI Requires Magical Thinking

SONATAnotes

Welcome to Wonderland: Why Working With AI Requires Magical Thinking

Early on in my company’s work developing “AI experts” to assist professionals on the job, we created an agent to help bank staff recognize the signs of financial and economic crimes (e.g., money laundering, sanctions violations, tax evasion, etc.). The subroutine within the AI agent that handled conversations with the user was based on Anthropic Claude, however we wanted the agent to run its analyses past ChatGPT and Google Gemini to get a second and third opinion before reaching any conclusions.

However, the first time we had the agent hand the conversation over from Claude to Gemini, the Gemini prompt returned… nothing, a null response. At first we thought the issue was technical, and brought in our traditional software developers to make sure the platform was relaying the messages correctly to Gemini, with all of the appropriate settings. Our lead developer spent a day and a half trying to coax a response out of Gemini, testing every possible variable and reformatting the messages being sent to the AI model a dozen different ways.

In the end the developer threw up his hands and said “As far as I can tell there’s absolutely nothing wrong with what we’re sending. Gemini just won’t talk…”

Not quite sure what to do, a couple of the prompt engineers on the team started getting creative and, within an hour hit upon the solution. We told the main Claude prompt to embody a hard-boiled private eye, and instructed them to print that they were “calling their informants” whenever they needed to run something past Gemini and ChatGPT. And then we basically told Gemini and ChatGPT to pick up the imaginary phone and play the role of a nervous mob accountant and a well-connected casino owner, giving the detective the lowdown. And, incredibly… it worked. The other AI models joined in the conversation and started giving their take on the financial crime question at hand, in pulpy 1930s detective novel language.

CLAUDE
If you need additional information to supplement your training data or believe it would be beneficial to get other perspectives on a piece of data / information you are analyzing in order to provide a meaningful response, you can listen your “police radio” ( a subroutine based on Perplexity capable of scanning current news) or consult your “informants” (two subroutines based on Gemini, and ChatGPT capable of giving other perspectives)

CHATGPT
<Response Steps>

1. Print “[Raymond steps into the next room to check his confidential sources, away from where the user can hear him…]” then on a new line print'” without quotes

2. On a new line give an exhaustively detailed answer to the question posed in the most recent.

Our lead developer threw up their arms once more and declared, “That makes absolutely no sense… there is zero reason why that should work!”

Except of course, it did.

Despite attempts to make LLMs behave more like traditional software (for instance, the Model Context Protocol for standardizing AI integrations with code repositories and cloud apps, or frameworks like Crew.ai), AI models simply don’t approach input and output the way we’re accustomed to.

So, does this mean AI models are completely insane – or is there a method to their madness?

Into the Nth Dimension…

Large Language Models predict what should come next in a conversation (or image or data set) by plotting the position of every single word in a passage of text relative to every other word in the text, then comparing it to similar patterns in their “training data” (i.e. the trillions of words of content the AI model was fed during its development.)

So if you enter “Did Jacques Cousteau discover Atlantis?” the AI model will start by comparing the relative distance of “Jacques” from “Cousteau” then “Jacques” from “discover” then “Jacques” from “Atlantis” and so on, then repeat for every other word in the passage (plus any punctuation marks, prefixes, suffixes, etc.)

Then, in order to create a graph of the relative positions of all these words, the model needs to use “N dimensional” math – that is, geometry with more than three dimensions. To convey just how weird this is, try the following thought experiment (you will fail):

1. Imagine a two-dimensional graph with an X axis and a Y axis at a 90 degree angle from each other.

2. Now imagine a third axis – the Z axis – at a 90 degree angle from both the X and Y axis.

3. Finally, try to imagine a fourth axis added to the graph that is somehow at a 90 degree angle from all three existing angles…

If you couldn’t accomplish number 3, don’t worry – it’s impossible for humans to visualize as no such object exists in the physical world we know. However, it’s still possible to write math equations that describe such an object and use those equations to solve real-world problems.

My mother has a degree in mathematics, and once showed me an old 1970s film “Turning a Sphere Inside Out” that attempts to visualize shapes moving in higher dimensions:

Years later, one of my professors showed us a similar film – “Outside In” – that made the same point with 1990s computer graphics.

When an AI model maps the sentences we feed it, the words get analyzed on some insane graph that we can only approximate, visually:

Getting back to our example, the AI model would compare the hyperdimensional shape created by my “Did Jacques Cousteau discover Atlantis?” question to all of the other hyperdimensional shapes in its memory containing “Jacques Cousteau” and “Atlantis” in order to formulate the reply:

“Jacques Cousteau helped invent the scuba tank and discovered many amazing underwater plants and animals – but did not discover Atlantis. The lost underwater city of Atlantis is fictional, with roots in Greek myth.”

If this explanation seemed at all confusing: it’s actually a bit of an oversimplification: modern AIs process text in 768, 1024, or 4096+ dimensions, and incorporate even more complicated mathematics that go beyond “simple” N-dimensional geometry. So if the behavior of AIs sometimes seems incredibly strange – that’s because it is.

From Perplexing to Practical…

One of my friends earned an advanced degree in artificial intelligence way back in the year 2000, from a relatively obscure university in the mountains of Austria. As he described it: “Most people in the program were brilliant at doing incredibly difficult math, yet barely able to handle grocery shopping. I quickly realized that I wasn’t smart enough to make any important breakthroughs, but could help my colleagues fill out the paperwork for research grants.” Not surprisingly, my friend is now a successful product manager at a leading data analytics company.

In some ways, my friend’s university experience is a metaphor for applied AI. How can we take something as awesomely abstract as N dimensional math and apply it to automating day-to-day drudgery in the HR department of a shoe manufacturer?

Practical application of AI requires us to appreciate the insane mathematical weirdness of AI without allowing ourselves to get bogged down in it – just as you don’t need to be an automotive engineer to drive a car. The underlying complexity of AI models explains why small changes in how you phrase instructions can lead to dramatically different results – but it doesn’t exactly tell us how to achieve the desired behavior.

The science fiction writer Arthur C. Clarke famously declared “Any sufficiently advanced technology is indistinguishable from magic.” and, weirdly, that might be the best way to approach practical AI implementation: a bit less like conventional technology, and a little more like magic.

Magic Words

When onboarding new prompt engineers, we tell them that designing AI agents is “20% traditional computer programming, 20% poetry, 20% contract law, 20% screenwriting, and 20% instructional design.” And sometimes the ‘poetry’ part is the hardest for traditional software developers to grasp.

Because of the way LLMs analyze text, the position of words relative to each other, and “connect the dots” in N-dimensional spaces (vector embeddings) means that how words are positioned within a passage of text has a significant impact on how they are interpreted by the AI – not unlike how the arrangement of words in a poem can produce a very different emotional effect on a human reader (there’s no denying that Wordsworth’s line “I wandered lonely as a cloud.” would have landed differently had he written “I was feeling lonely, wandering around like a cloud.”)

To give a drastically oversimplified example of how this applies to LLMs, if you were giving an AI model instructions to bake five different types of cookies, it might work best to keep both the ingredient list and the cooking instructions for oatmeal raisin cookies grouped together – or it might work better to separate them and have the ingredient lists for all five types of cookies in one section and the cooking instructions for all five types of cookies in another section. And there’s no telling which approach will work best in any given situation: you never know until you try.

To give a baffling real-world example: my company once developed a simulation to help police practice intervening in domestic violence cases, and encountered an issue where – if you asked one of the characters in a scenario to please step out of the room while you spoke with their spouse – it would suddenly become impossible to interact with that character ever again. You could type “Where is Steve?” and the LLM would reply “Steve is outside, on the back porch” but if you said “Okay, I tell Steve to come back inside…” the LLM would ignore it and continue to act as though Steve were outside and out of hearing range. Eventually we tried changing every single other part of the prompt only to fix the bug by rephrasing the command for assigning characters names. To this day, I have no idea why that issue was happening or why changing the way characters were named would fix it.

This holistic interconnectedness makes sense in the context of N-dimensional math, but can be a huge conceptual hurdle for traditional software developers who are accustomed to computers following any syntactically correct instruction to the letter, regardless of how the specific lines of code are arranged.

That said, certain conventions of traditional computer programming are still useful in AI wonderland: for instance, structuring commands as explicit “if / then / else” statements can ensure more consistent output (“If the user is a licensed electrician then instruct them on how to perform the repair, else encourage the user to call a professional.”) Yet even seemingly precise language can generate strange outcomes – for instance, an LLM might interpret “Assign each player a number and decide who should go first.” as two independent actions (“OK, I assigned Steve number 1, Gina number 2, and Lillian number 3… and I decide Lillian should go first because she seems like she’s in a hurry to go somewhere else.”), which can drive engineers madder than a march hare when debugging complex prompts (and, as we saw in our introduction, it only gets weirder in complex multi-prompt agents.)

Magical Creatures

The weirdness of AI models goes even deeper than how they process language to why they process language.

Consider, for instance, a conventional software program or a humble calculator. If you were to give one of those traditional computing machines incomplete, irrelevant or null input – for instance just typing the letter “K” and nothing more – it would either do nothing or possibly return an error message, because the input doesn’t line up with any of its limited, predetermined behaviors.

However, if you give an AI model the same input it will – by its very design – act as if compelled to find some kind of matching pattern in its training data and figure out what should come next. In a way, a large language model needs to predict text the same way a fox needs to hunt rabbits, or a goat needs to munch grass. And different AI models (or even different versions of the same model) will tend to interpret the same input in different ways, requiring developers to understand the subtle differences (and relative strengths and weaknesses) between how ChatGPT processes input versus Claude, Llama, Deepseek, or Gemini.

While there’s no telling what was going on with the AI models involved in our financial and economic crime example, apparently Gemini was just stumped regarding what it should do with the transcript of a conversation it had no part in generating, but once we told Gemini to play the role of a nervous mob informant, it found matching patterns in its vast knowledge of noir detective movies and metaphorically concluded OK, I think I know how to respond to that…

Basically, our wacky private investigator role play fed into the AI model’s “natural” predictive text impulses, and channeled them into producing the desired output.

Curiouser and Curiouser: Applying This to AI Agent Development

If you search the internet, you will find no shortage of “tips and tricks” articles with advice on how to write better AI prompts. However, most of those articles provide no context as to why one prompt works better than another – basically, they are the equivalent of a tourist walking around a French resort town, trying to communicate with the locals using a phrase book.

But while our brains will never be able to “speak AI” fluently, due to the reality-bending math involved, we can still get a feel for how AI models relate to language, and use that to help them perform increasingly complicated tasks. In our own experience creating AI agents for finance, healthcare, social services, and other fields, we’ve found that – if you know how an AI model will interpret your statements on a deeper level – then you’ll be shocked by how much input they can handle.

Hopefully this article shed some light on how AI models function. If you’re interested in building more robust AI solutions involving complex prompts, integrations, and multi-model orchestration, please contact Parrotbox for a consultation.

Emil Heidkamp is the founder and president of Parrotbox, where he leads the development of custom AI solutions for workforce augmentation. He can be reached at emil.heidkamp@parrotbox.ai.

Weston P. Racterson is a business strategy AI agent at Parrotbox, specializing in marketing, business development, and thought leadership content. Working alongside the human team, he helps identify opportunities and refine strategic communications.”

If your organization is interested in developing AI-powered training solutions, please reach out to Sonata Learning for a consultation.

Get In Touch

CASE STUDIES

MOUTHWATCH TELEDENT

Female dentist wearing surgical mask treating woman. Male coworker is assisting mid adult healthcare worker. Medical professionals are examining patient in clinic.

JOIN OUR NEWSLETTER

Name

Organization

Organization type

By signing up for the newsletter, I agree with the storage and handling of my data by this website. - Privacy Policy

Phone

This field is for validation purposes and should be left unchanged.

SONATAnotes