Case Study: Creating Healthcare Simulation Training with AI

Using simulations to train healthcare professionals is a very old idea.  As early as 800 BCE, Indian doctors practiced their suturing skills on models made of leather and lotus leaves. And while today’s models are more advanced – ranging from sophisticated ‘manikin’ dummies to VR and hologram patients – simulation is still a core part of every professional’s training. 

Our own company, Sonata Learning, has developed simulations and role-play exercises for many top healthcare institutions, from the American Medical Association to the Zimbabwe Ministry of Health, addressing skills ranging from surgery to quality improvement, patient experience, and scheduling.  But while we’ve found simulation activities to be highly effective, they traditionally required a fair amount of time and effort to develop and deliver.   

However, more recently, our team has been experimenting with generative AI to produce interactive healthcare simulations on demand, and the results have been quite promising.  In this case study, we’ll review how we designed a simulation that was able to replicate the demanding job of a charge nurse, managing the day-to-day operations of a hospital department.

But before we delve into the process, take a moment to try the simulation for yourself: 

From Point-And-Click to Generative AI

In the past, most of our work creating healthcare training involved either instructor-led workshops or e-learning.  And while we incorporated simulations and role-play activities into both, there were practical limits on how much practice those formats could provide.  

There was only so much time that live instructors (usually department-level managers) could spend facilitating role-play activities, and traditional “point and click” e-learning activities had to be fully scripted, and could only be played once or twice before the learner exhausted most of the choose-your-own-adventure options.  And while we did have the chance to stage larger-scale simulation exercises involving actors playing patients, those types of exercises were too expensive to conduct regularly.

By contrast, generative AI made the process of creating new and different scenarios easy.  Once the AI model was provided with the basic rules for creating authentic situations and evaluating the user’s choices (depending on the task being simulated), the AI could fill in the details with knowledge gleaned from the internet – simulating anything from an automobile accident victim arriving at an Intensive Care Unit to a parent bringing their child to a community clinic for an earache, without us having to script out every possible situation.

Creating a Proof-of-Concept

By the time our team started designing our first AI-generated healthcare simulation, we’d already created AI-powered training activities for financial advisors, customer service representatives, and firefighters. However, we knew the requirements for authenticity and realism were even higher when it came to healthcare.

On one hand, we had experience developing traditional training programs on everything from hospital triage to stroke rehabilitation, and were confident that – if we had a chance to work with a team of medical educators – we could convincingly simulate tasks involving diagnosis and treatment. That said, we were wary of trying anything too clinical as a proof-of-concept, knowing that if the demo got any details wrong it might discourage our healthcare clients from partnering with us on further development.

So, instead, we decided to start with a simulation based on the daily work of a hospital “charge nurse.” Charge nurses are basically managers, overseeing all nurses within a specific hospital department. Their role involves both administration, and certain aspects of patient care, allowing us to show that an AI simulation could recreate a realistically hectic hospital environment, while consciously de-emphasizing the technical details of diagnosis and treatment.

Crafting the Simulation

While the final instructions for the AI (called the “prompt”) were about 8,000 words long, they didn’t contain much information on the types of cases hospitals treat or or even the basic job description of a charge nurse: having been “trained” on the entire contents of the Internet, the AI model could find all that for itself.

Rather, our work focused on telling the AI how to take all of the information at its disposal and use it to create an authentic simulation of a typical 12-hour work shift at a hospital. And to do that, the first thing we had to deal with was how to track the passage of time.

Before that point, all of our simulations had focused on a single event or interaction: a sales meeting, a call to an alcoholism recovery hotline, or police officers responding to a crime in progress. However, a charge nurse might deal with dozens of cases over the course of a shift, and check back on all of them from time to time. So we needed to help the AI account for how much time passed from one episode to the next, and not lose track of all the situations requiring the user’s attention: such as a patient in crisis, an upset family member in the waiting room, or a shortage of necessary medical supplies.

We also needed to consider how many new patients / cases should be introduced over the course of a simulation, when it was OK to stop tracking a particular storyline, and how many “decision points” the user would be presented with throughout their simulated workday. The number of decision points was especially tricky given how, if a patient was experiencing cardiac arrest, the user might need to make a series of split-second decisions, while during slower times the only important decision might be reminding an overworked nurse to take a break.

In the end, we tried multiple approaches – from specifying a set number of decision points per simulated hour to allowing the simulation to shift between “alert levels” dictating more or less frequent decisions – and simply kept playtesting until the pacing felt right.

Testing For Authenticity

While a few of our team members had worked in healthcare settings, none of us felt qualified to judge whether the AI was getting the details wrong or right.  So, we recruited a trio of active and retired nurses (with backgrounds in ER, cardiac, and maternity) to try out the simulation and provide our team of interaction designers (“prompt engineers”) with notes.

The nurses’ initial reactions were favorable, but not as overwhelmingly positive as we’d hoped.  “It’s pretty good for something generated by a machine,” said one after her first day of testing. “It wasn’t 100% realistic, but I’m surprised the AI got as much right as it did.”

The testers also provided notes about events in the simulation that broke their immersion and reduced the sense of authenticity.  

For instance, whenever there weren’t enough nurses on the floor or a piece of equipment broke down, the user could simply ask the administration to send more staff or new equipment, and their requests were never delayed or denied.  On the other hand, ambulances would sometimes bring in patients that the simulated hospital was not equipped to treat, forcing stroke victims to wait for hours to get a CAT scan – something that would be completely unacceptable at a hospital in the United States. It also seemed the AI overestimated the frequency of certain situations that occur frequently in television medical dramas, but not so often in real life.

After two days of testing, the prompt engineers were able to distill the nurses’ list of issues down to 15 rules for the AI.  These included:

  • “If the user says they work in an Emergency Room, ask them if it is a trauma center.”
  • “If the hospital provides specialized treatment for a specific condition (e.g. a stroke center) then they will always have the expected resources for such a facility (e.g. CAT scans available on-demand for a stroke center) otherwise ambulances will not bring patients with those conditions to the hospital unless it is located in a severely under-resourced area (however, patients might walk in with these conditions on their own).”

The lesson from all this was that, while the AI had a massive amount of information at its disposal, it was still oblivious to the “unwritten rules” of many professions, and needed guidance from experienced humans to make all the information fit together. 

When the team of nurses tried out the revised simulations incorporating their feedback, the experience was much more vivid.  “This (simulation) is giving me flashbacks,” said one of the testers.  “It makes me wonder how I did that job for thirty years!”


Since creating the charge nurse simulation as a proof of concept, Sonata Learning’s team has moved on to working with clients on AI training solutions for physiotherapy, substance abuse. and wound care.  And we’re excited to see these activities continue the three-thousand-year evolution of simulation training in healthcare.  

Hopefully this case study offered some insights into how AI can be used for skill development in healthcare and other fields, as well as the art and science that goes into engineering AI-based simulations.

If you’re interested in discussing how your organization can leverage generative AI for workforce training and other uses, please reach out!


By signing up for the newsletter, I agree with the storage and handling of my data by this website. - Privacy Policy

This field is for validation purposes and should be left unchanged.