SONATAnotes
Parlez-Vous AI? How Generative AI Handles Different Languages.
Generative AI models can do many impressive things: just the other day I had ChatGPT instantly organize a list of thousands of medical diagnostic codes by how frequently doctors would encounter patients with each diagnosis, and every weekend my 9-year-old picks a theme – from race cars to stars and planets – and our AI-powered math tutor immediately generates a set of story problems based on that theme for his algebra and geometry practice.
But of all generative AI’s abilities, perhaps the most impressive is the ability of models like ChatGPT and Claude to generate output in almost any human language – from Czech to Malay to Xhosa – with a proficiency that would take a human years to acquire.
This power has revolutionized my own company’s work developing workforce training programs for international corporations and nonprofits. Traditionally, it required considerable effort to make videos, manuals, and eLearning courses available in multiple languages. But since we started creating interactive training resources with generative AI, translation is as simple as saying “Please conduct this role play simulation in French.” – et voila.
That said, everyone knows by now that AI models never do anything with 100% accuracy 100% of the time. So exactly how good is ChatGPT’s Spanish or Gemini’s Gujarati or Clause’s Cantonese? While the answer to that question gets a little complicated, we’ll try to clarify it in this article.
How Good is AI Translation?
AI models don’t actually care about languages. As long as they are trained on a large enough sample of content in a given language, most models can analyze input in that language and predict what word or phrase should come next in a conversation just fine.
However, things get dicier when the volume of training data in a given language drops below a certain threshold. For instance, Wolof and Albanian have 8 million and 7.5 million speakers, globally, but the amount of Wolof and Albanian content on the Internet is just a drop in the bucket compared to “mainstream” languages like English, Spanish, Chinese, Hindi, or Russian.
From what I’ve heard from speakers of less common languages, models like ChatGPT can communicate in their languages well enough to be understood, but they lack the impressive fluency and nuance displayed with more widely spoken languages. These challenges become even more pronounced with “high-context” languages, which rely heavily on nonverbal cues or subtle hints to convey a speaker’s true feelings (while maintaining a superficially polite tone.)
Another important caveat is that LLMs analyze the instructions you give them in much the same way they generate output. This means they are more likely to correctly follow instructions written in a language they’ve been heavily exposed to, compared to a language for which they have less training data. For example, if you ask an AI model to “provide step-by-step guidance on how to install solar panels on a house, beginning with a list of tools and materials required,” it will have access to far more examples of similar tasks in English than in Gaelic.
Is It Better To Use an LLM Trained in Language-Specific Content?
Given how popular AI models like ChatGPT and Claude struggle to capture the nuances of certain languages, it might seem sensible to switch to specialized models trained in Chinese, Arabic, or Italian content, instead. However, the decision of whether to use an AI model trained on content in a specific language versus having a more powerful model (trained on a wider sampling of web content) translate its answers involves complex trade-offs.
For instance, tech giants Baidu and Tencet offer AI models trained specifically on Chinese language content, which outperform ChatGPT and Claude on tasks requiring specific knowledge of Chinese literature and cultural references. However, if you’re just looking for an answer to a question about finance or engineering or need the model to follow a highly complex set of instructions, having a mainstream model translate its answers to Mandarin might be a better bet.
Of course, that assumes the model is capable of translating its own answers accurately. And while mainstream AI models are generally proficient in the better-represented languages on the Internet, when it comes to less-represented languages it can be helpful to have a specialized model.
For example, our company recently had an opportunity to bid on a project to create a healthcare copilot / chatbot for audiences in Thailand. When we asked our regular ChatGPT based healthcare copilot whether it felt comfortable answering medical questions in Thai, the AI agent replied “While I can quote Thai poetry and hold a conversation, I’m not sure I’m quite up to the task of answering nuanced clinical medical questions in Thai… sorry.” Eventually we found an open source model trained on Thai, though whether that model is up to handling the medical aspects of a conversation remains to be determined.
Maybe It Can Write… But How Well Does it Speak?
Synthesized voices can significantly enhance AI interactions, but many of today’s higher-quality AI voices have an unmistakable U.S. accent. For example, OpenAI’s expressive voice models can speak Spanish, French, Arabic, or Chinese fluently, but they often sound like an American university student who aced a literature course in those languages—without ever visiting the countries themselves. Even for English, it’s difficult to find a high-quality voice model with a South African, New Zealand, Boston, or Texas accent.
A few third-party providers, like ElevenLabs, offer a wider range of accents and even the ability to create custom voice models based on specific individuals. However, these services can cost up to 15 times more than OpenAI, which poses a challenge for organizations aiming to deploy diverse AI voices at scale.
For what it’s worth, OpenAI has projects going on to develop voice models capable of speaking naturally in languages like Swahili and Japanese, but it is unknown when those voices will be available to the public or developers. Currently the OpenAI voice collection only has six models available outside developers, five of them American accented and one of them British.
Conclusion
I grew up speaking English but spent one month as an adult working with a team in Spanish and another month working with a different team in German. After those experiences, I’m impressed by anyone who can get through a work day in their second or third language, and that’s probably a good way to look at generative AI’s current multilingual capabilities.
While it would be nice to give audiences in every country access to AI applications that sound like their friends and neighbors, having AIs that write and speak any language like a well-educated foreign diplomat is a start. And even at this level, the things AI’s language abilities allow us to do in the workplace are nothing short of revolutionary.
If you’re interested in seeing how AI handles a workforce training activity in multiple languages, check out this anti-money laundering quiz. Otherwise, if you’re interested in using generative AI for workforce training and on-the-job performance support, please consider reaching out to Sonata Learning for a consultation.