SONATAnotes

The People vs. The Robots: A Non-Attorney’s Guide to AI & The Law

One of my first “real” jobs was working as an assistant at a law firm. On the first day, they taught us to preface any statements about legal matters with:

“I am not an attorney, and nothing I say should be taken as legal advice.”

So, in that spirit: I am not an attorney, and this blog about the legal aspects of using AI in the workplace should not be taken as legal advice.

From the light bulb to the smartphone, every revolutionary technology has sparked both wonder and fear. In my company’s work helping organizations integrate generative AI into employee training, we’ve encountered a range of concerns—some valid (“What if it makes a factual error?”), others more alarmist (“What if it steals all our data?”).

While we’ve written other blogs on topics like data privacy and factual accuracy, the legal questions surrounding AI are by far the most complex. In this blog, we’ll offer a breakdown of the state of AI and the law (circa late 2024), and the possible implications for organizations using AI in the workplace.

First, a Note on Terminology…

Precise language is critical in law. While the legal terminology for AI is still evolving, here’s how we’ll define a few key terms for the purposes of this blog:

  • AI Agent: A solution that uses generative AI to perform a task. For example, an AI tool designed to prepare taxes or teach users to read Swedish.
  • Large Language Models (LLMs): The engines powering AI agents, interpreting input and generating responses. Examples include ChatGPT, Claude, and Gemini.
  • Prompts: Instructions provided by humans to guide the AI agent’s tasks, from a simple request for a definition to a detailed, 8,000-word framework for a marketing plan.
  • Session Input: Additional input AI agents receive and respond to during a session, beyond the initial prompt.
  • Training Data:  The material used to train LLMs to recognize patterns, understand context, and form connections. This typically includes internet snapshots, books, and other texts. In some cases, AI models also draw on retrieval-augmented generation (RAG) sources—such as uploaded documents or search results—to generate more specific responses.

Now, on to the big questions…

Ownership of Outputs

Imagine you built an AI agent with ChatGPT and instructed it to “Write a poem involving the motion of planets and a conversation over tea with a regret-filled old woman in the style of the late Nobel prize-winning Polish poet Wisława Szymborska” and the agent generated this:

Tea with the Cosmos

The planets revolve, faithful as clock hands,
each in their orbit, their designated path—
no sway, no doubt, no drifting off course.

At the kitchen table, the old woman stirs
her tea—one slow circle, counter-clockwise.
“I didn’t go to Paris when I had the chance,”
she says, tracing a map with her fingertip.
“I stayed. Thought love would anchor me—
Then, as you see, love went on without me.”

Out the window, the constellations rise.
Orion lifts his bow, shooting blindly.
Saturn tightens his belt of rings.
The old woman offers more tea,
but you decline. 

“I suppose you have to go,” she says,
though she holds your gaze a beat longer,
 as if to keep you in her orbit,
just a little longer than gravity allows.

So, who owns this poem?

  • The user? They initiated the task with the prompt.
  • The LLM Developer (e.g., OpenAI)?  They built the underlying model that produced the words.
  • The prompt engineer?  If this were a more complex AI agent with instructions written by a professional prompt engineer, they might be able to claim the outputs as products of a proprietary process.
  • The estate of Wisława Szymborska?  If the poem mimics her style too closely, they might have a claim.

One could make an argument for any or all of the parties above. While computer code is protected as a “literary work,” the legal standing of AI-generated outputs is less certain. The case of Whelan Associates, Inc. v. Jaslow Dental Laboratory, Inc. (1986), extended copyright protection beyond the literal lines of code to the underlying structure of a software program. However, courts have also ruled that the outputs of software—like documents created with a word processor—do not automatically belong to the software’s developer.

For their part, LLM developers like OpenAI and Anthropic explicitly state that they don’t claim ownership over user-generated outputs.  But even that doesn’t automatically grant ownership to the user:  recent cases like *Zarya of the Dawn* (2023) complicate things further. In that case, the U.S. Copyright Office ruled that the human contributions to a comic book were copyrightable, but the purely AI-generated portions were not.

Ownership of Outputs

Once again,  I am not an attorney, and this blog is not legal advice, but when our company builds custom AI agents for clients, we take the same stance as OpenAI and Anthropic.  Our license agreement states that, while we own the instructions our prompt engineers create, we don’t make any claim of ownership for the outputs generated by those instructions.

Use of Inputs

AI models generate outputs through pattern recognition and prediction. Developers feed vast amounts of “training data” into algorithms that analyze the relationships between words until the model is able to reasonably predict what sort of words it should generate when someone says “write some lyrics for a pop song.”

In that particular case, if the AI’s training data included a lot of Taylor Swift songs, then it’s likely the output will be vaguely Taylor Swift-like.  However, it’s also possible the AI model might adhere too closely to the training data and reconstruct a line or two from an actual Taylor Swift song.  And while there are things LLM developers and prompt engineers can do to reduce the chances of an AI agent violating copyrights, the risk is never zero.

This has already led to several high-profile lawsuits, mostly targeting the major LLM developers: for instance, the New York Times suing OpenAI and Getty Images suing Stability AI. Which begs the questions: “Why can’t AI companies avoid copyrighted material?” and “Why can’t AI simply cite its sources?”

The answer is that AI doesn’t “read” in the same way humans do—it identifies general patterns across entire libraries of text rather than “understanding” the content of individual works.  This makes it difficult to cite where, exactly, the AI model found its inspiration to generate a particular line of output.

Limiting access to training data also poses problems.  Just as a child learns language through immersion, AI models rely on exposure to large datasets, often encompassing trillions of words. Denying models access to copyrighted works could reduce the capabilities of future models, but it would also lower the risk of unintentional copyright violations.

Mitigating Risk

Again, I am not an attorney, and this blog is not legal advice – but our company’s policy has been to explain the risks to clients, encourage them to review outputs before using them for something public, and clearly outline who would be liable in different cases as part of the license agreement.

Meanwhile, LLM developers like OpenAI and Anthropic are beginning to offer some level of legal protection for enterprise-level users, though organizations should carefully review the fine print to ensure nothing in their application disqualifies them for protection under the companies’ terms of use

Liability

As organizations start using AI to handle more and more tasks, that will introduce new types of risk.  For instance, who is liable if an AI financial advisor suggests illegal tax avoidance strategies?

Lawsuits against autonomous vehicle companies like Uber, Tesla, and Waymo suggest that courts would treat the AI agent like any defective product, and distribute liability among all parties who contribute to its development.   And while the vast majority of accidents involving autonomous vehicles result from human drivers breaking the law, any mishap can hurt AI companies in the court of public opinion.

Making Informed Decisions

As you know by now, I am not an attorney, and this blog is not legal advice. That said, organizations should approach AI adoption with a mix of caution, common sense, and clear, consistent protocols.

Generally, when a client requests a custom AI agent for a particular task, we will outline our testing protocol and what it does and does not cover, while making it clear the customer has an obligation to test an agent’s fitness for purpose before deploying it with their own customers or staff.

“Unknown Unknowns”

The legal landscape surrounding AI is filled with unknowns, but as the laws become clearer, new industries will likely emerge to help organizations ensure compliance. AI developers are already securing licensing agreements with publishers like Time and Condé Nast to use their content as training data, and we may eventually see AI agents certified to perform specialized tasks, much like doctors and lawyers require licenses today.

In the meantime, every organization must weigh the risks and benefits of using AI. Our company has already encountered situations where legal teams at two different Fortune 500 organizations in the same industry evaluated the same AI agent and came to different conclusions. One company’s legal department said, “Just show a disclaimer, and it’s fine,” while the other has been debating the issue for months.

Who’s right? Only time—and the outcomes of various ongoing legal proceedings—will tell.

If your organization is looking to leverage generative AI for workforce training or creating copilots / chatbots based on your knowledge products, please consider reaching out to Sonata Learning for a consultation.

LATEST RESOURCES
CASE STUDIES
JOIN OUR NEWSLETTER
Name

By signing up for the newsletter, I agree with the storage and handling of my data by this website. - Privacy Policy

This field is for validation purposes and should be left unchanged.