SONATAnotes
Looking AI in the Eye: Do Human-Like Avatars Improve AI interactions

One thing I’ve learned after a year of developing AI apps for workforce training is that lots of customers really, really want their AI agents to have a face. In fact, one prospective customer even asked if a training simulation for call center representatives could “Show a 3D character of the customer talking during the role play?” – even though, presumably, call center agents wouldn’t see their human customers’ faces during real-life interactions.
In some ways, this desire to give AI entities faces is understandable: it can make AI training simulations more like video games (and video games are fun, right?) and in theory should make conversations with AI coaches and customer service representatives more emotionally satisfying (because facial expressions and body language are a big part of communication, right?).

However, given that studies of Zoom calls suggest we don’t necessarily like looking other humans in the face during certain types of interactions, is AI really better with avatars?
Full disclosure: our company does develop learning tools that use avatars, however we’re careful about when and how we use them, such as in the customer service training simulation below:
So, that said, let’s examine the pros and cons of being able to look AI agents in the eye when we’re talking with them.
Avatars: Not Exactly New
The idea of having 3D representations of people or characters in virtual worlds isn’t new. It’s been discussed in science fiction since at least the 1980s and, for people who play video games, the idea of talking to 3D characters or having an “avatar” represent you in a game world has been commonplace for decades.

I bet you wish you were playing video games instead of reading this…
However, the use of avatars has never really caught on in the workplace or everyday social interactions. Virtual reality platforms like Second Life and the Metaverse – which allow people to interact in 3D environments with avatars – have come and gone, and there’s a niche market for using avatars to engage in anonymous conversations with strangers online, yet most people prefer to interact on conventional social media platforms through text and images, or do Zoom calls with voice and video for work… or even just talk on the phone.

Your next business meeting?
Avatars + AI = Chocolate + Peanut Butter… or Ice Cream + Ketchup?
Ever since generative AI started getting popular, there’s been a renewed interest in avatars. When you combine ChatGPT’s ability to generate text with video / voice generation tools like Synthsia, Heygen, and D-ID, it’s now possible for companies to create video messages from their CEO without the CEO having to sit down in front of a camera (or, worse, for all those scammy influencers who used to make videos from inside their cars to make those videos without having to get in their cars.)
What’s more unsettling… a robot taking your job or a robot welcoming you to a job?
The fact these seem so real probably says more about influencers than AI.
So – should we be welcoming the combination of AI generated speech synchronized to AI generated avatar videos as powerful new communications tools… or decrying it as a plague on our YouTube / TikTok feeds?
The answer, as always, is “it depends.”
Why AI + Avatars Can Feel Uncanny
By now, most of us have heard the term “uncanny valley”, which the famous robotics professor Masahiro Mori coined in 1970 to describe how people feel a visceral sense of “fear and disgust” when a robot looks almost but not quite human or when human-looking robots move in unnatural, zombie-like ways. This is understandable, as our brains evolved to watch other humans carefully, and sound the alarm whenever something seems off (basically, when we see something that looks off, our subconscious minds wonder “Is this person sick with a plague?” or “Is this person getting ready to attack me?”.)

The ‘uncanny valley’ is why zombies will always be creepy.
And the trouble with the uncanny valley is that the more human something looks, the more any inconsistencies will disturb people. A fairly recent study by Stanford University showed that the less human an avatar looked, the more people were willing to overlook any weird movements. In short, the better avatars look, the higher it raises the bar for authenticity, which can cause them to backfire with audiences.
This is what led the animators at Pixar to go with a stylized look for THE INCREDIBLES rather than trying to go completely realistic like the animated people in THE POLAR EXPRESS, which critics blasted for being unsettling.
Now, whether today’s AI video generators have bridged the uncanny valley is a matter of opinion. However, if you watch a typical AI video long enough you will notice certain gestures – an arched eyebrow, a hand motion, a head nod – repeating at intervals of 20 to 40 seconds, on a loop. Even if your conscious brain doesn’t notice, your ever-vigilant subconscious is keeping track.
Whether this poses a problem depends on how long you intend for people to stare at an avatar, which raises our next point…
When AI + Avatars Isn’t Even Necessary
Sometimes, our desire to have AI talking heads leads people to want to use avatars even in situations where audiences wouldn’t want to look at an actual human.
For instance, while research on extended avatar interactions is limited, there’s extensive research on how people feel about interacting with other humans on camera in Zoom calls. During the pandemic, many managers became obsessed with the idea of using cameras to ensure people were paying attention during virtual meetings. However, research by Stanford University’s Virtual Human Interaction Lab suggests that traditional phone calls – without video – are actually better for many types of communication.
To give an oversimplified synopsis of the study’s findings – most humans instinctively make eye contact during in-person conversations, to read the other party’s emotional cues, however too much eye contact can be distracting or uncomfortable. But while most people will periodically look away from the other party during in-person conversations, having a camera pointed at you during Zoom calls creates an unnatural expectation to keep staring at the faces on the screen – leading to a form of mental exhaustion that scientists have dubbed “Zoom fatigue”. And this effect can carry over to extended interactions with unblinking AI video avatars.
Another related bit of research shows that – while humans do prefer to be able to see facial expressions and body language in personal or emotionally charged conversations – we really don’t want to see the other person during other types of interactions. In a study of 12,000 consumers and 2,000 businesses, when people need to actually get something done – whether it’s checking their bank balance, resolving an insurance claim, or consulting with healthcare providers – they overwhelmingly prefer voice over video, specifically:
- 32% prefer a basic phone call
- 20% choose email
- 12% opt for text messaging
- Only 6% want video calls
And the good news here is that voice-only conversations are something AI agents can already do really well, with little or no uncanniness. By trying to add 3D avatars to every AI interaction – we might be solving a problem that doesn’t even exist – or even un-solving a problem we’ve already solved.
This is probably why earlier generation AI agents like Alexa, Siri, and Google Assistant have managed to achieve widespread adoption despite (or perhaps because of) the fact they’ve never had avatar faces.
When AI + Avatars Creates Connection
So is there zero value in incorporating video avatars into training simulations or other types of AI interactions? No, it just depends on the specific purpose.
Long before we started developing AI tools for workforce training, our company has created traditional classroom and online / multimedia training content for major organizations. And whether we’re designing training for appliance repair technicians or coaching programs for C-Suite executives, we start by asking “What does our audience need to do in order to succeed at their job?”
In some cases, reading other people’s facial expressions and body language is definitely part of the job task. And, unsurprisingly, traditional e-learning development tools like Articulate, iSpring and Branchtrack have included pre-scripted, choose-your-own-adventure style conversation simulations as part of their standard features for nearly a decade.

iSpring Talkmaster
While the characters in these activities aren’t animated, they do allow people to gauge the emotional reaction of the character, without coming across as creepy. And that’s why, when our company incorporated avatars into our role play simulations, we opted for photorealistic but non-animated characters, like in the customer service demo.

In addition to avoiding uncanniness, going with the static, 2D photo avatars solved two other problems:
- First, it’s much easier to create custom characters for a particular organization, industry, or region (e.g. if we want to create a collection of characters who look like realistic bank customers or restaurant patrons in Mexico or Thailand)
- Second, real-time “streaming avatar” apps are expensive – ranging from 15 cents to 75 cents per minute, or in some cases more. And while that might not seem like much in the context of a on-off demo, if you had 300 salespeople do a 10 minute roleplay activity three times, that’s an added $6,750 in delivery costs which – depending on your program’s budget – might or might not pose a problem.
To Avatar… or Not To Avatar?
As we said at the beginning, the question of whether it’s worth integrating animated video avatars into your AI agents is “it depends”. Generally, we would recommend avatars for training job skills that require attention to facial expressions and body language (e.g. in-person customer service interactions, difficult conversations as a manager, patient communication skills for healthcare), but even then static photorealistic avatars might be preferable given:
- It allows us to depict a wider range of characters, and customize them for our clients’ industries and markets.
- It bypasses the uncanniness of human-looking entities moving robotically.
- It takes the pressure off the learner to maintain eye contact with an unnaturally unblinking entity and let them instead examine the customer / patient / interviewee without the machine seeming to stare back at them.
- At scale, it costs far less than avatars that move (for now), which matters when you start taking AI training beyond the small pilot project stage.
Meanwhile, we would not recommend avatars for any application where people wouldn’t even want to video chat with a human (customer service, certain purely technical coaching and support interactions) – at least not until they reach a point where the avatar can manifest as a hologram in the same room with you (and perhaps not even then.)
Hopefully this article offered some useful insights into the technology, psychology, and practical aspects of incorporating avatars into AI agents and training simulations. If you would like to discuss this or any other applications of AI for training and on-the-job support, please consider reaching out to Parrotbox for a consultation.
Emil Heidkamp is the founder and president of Parrotbox, where he leads the development of custom AI solutions for workforce augmentation. He can be reached at emil.heidkamp@parrotbox.ai.
Weston P. Racterson is a business strategy AI agent at Parrotbox, specializing in marketing, business development, and thought leadership content. Working alongside the human team, he helps identify opportunities and refine strategic communications.”
If your organization is interested in developing AI-powered training solutions, please reach out to Sonata Learning for a consultation.