SONATAnotes
Dehydrate Your Data: Making Critical Information Accessible for Both Humans & AI

Years ago, a customer on the Air Canada website asked the airline’s chatbot if they offered a bereavement discount, so he could travel to his mother’s funeral. The chatbot said the customer could book a full-fare flight, then apply for a rebate after the fact. However, when the customer went to claim the discount, the airline refused, stating that bereavement discounts had to be applied for in advance, and – had he followed the link to the official policy provided by the chatbot – the customer would have known that.
Understandably, the customer objected, and eventually a court ordered the airline to pay him the rebate plus litigation costs.
In the timeline of AI evolution, the year 2022 – when this incident happened – may as well be the stone age, however the underlying challenges with having AI agents reference organization-specific data still matter today. While case studies are hard to come by (because businesses rarely advertise their internal debacles) an anonymous poll of CIOs by Dataiku found 59% have already experienced a serious business incident due to AI inaccuracies over the past year.
At the core of this problem is the fact that, despite their access to vast pools of data, AI models have a blind spot when it comes to organization specific or paywalled information sources. In the 1980s, the science fiction writer William Gibson observed that some of the most valuable information about the future of society and technology existed in “invisible media” – all the trade magazines, scientific papers, government publications, and corporate R&D documents that the average person couldn’t access or couldn’t bother to read. While the internet and AI have made much of this invisible media accessible – allowing fitness enthusiasts to access the latest studies on dietary supplements or maintenance technicians to locate the manuals for a discontinued industrial fan – there is still a massive underground layer of information that can’t be found via Google and never makes it into the training data of the big commercial AI models.
Facing this, many of our clients initially ask, “Can we train an AI model on our data?”
This question usually comes from organizations that have already done the hard work of centralizing their documentation (getting everything into SharePoint, building out their knowledge bases, digitizing their procedures), but it reflects common misunderstandings of what it means to “train a model”: in reality, making data accessible to AI is more than gathering all your documents in one place and telling AI to “do a search.” But for those willing to take the next step and put in the work to make their information truly AI-accessible, it’s where the real value of AI can be mined.
So, what does it actually take to make organizational data accessible and usable for today’s AI models?
AI Doesn’t “Read”

The first time you upload a document into an AI chatbot and ask “What does this research paper say about the effects of heat treatment temperature on titanium alloys?” and receive a coherent answer, it can feel like a revelation.
However, if you try this often enough you’ll probably encounter a situation where you ask the chatbot “What does the company policy say about connecting personal devices to the network?” and the chatbot replies “It depends on the specifications of the device and any relevant data security concerns” when the actual text of the policy is that “personal devices are not permitted under any circumstances.”
When this happens on an individual level, it can be frustrating. But when these types of mistakes happen repeatedly across an organization they can create serious operational risk.
So, how can an AI model capable of producing correct answers for complex legal questions or solving difficult engineering problems fail to understand something written so clearly in a document?
The answer lies in the fact that large language models do not read documents the way humans do – starting at the beginning and moving to the end. Instead, imagine taking every word from every document in your organization, writing each word on a separate index card, and then connecting those cards with thousands of colored strings based on how often those words appear near each other. Then imagine doing that not just for your organization’s documents, but for all your documents plus every other piece of information the AI model was initially trained on: which – for a Large Language Model – typically includes millions of books, websites, and articles.
When you ask an AI model a question about a document, it’s looking at that massive, tangled web with millions (and perhaps billions or trillions) of connections and trying to follow the strongest threads to an answer. Sometimes those threads lead to the right place. Sometimes they lead somewhere that seems right, but isn’t.
This is why the same AI model that can write elegant code or analyze complex legal arguments might confidently tell a customer that your return policy is 60 days when it’s actually 30 – the model found a strong pattern connecting “return policy” and “60 days” somewhere in its training data, even though your specific policy says something different.
The challenge, then, is to structure and organize your data in a way that minimizes the number of threads an AI model needs to analyze, while maximizing the probability that it will make the right connections to your information, not just plausible-sounding information from its general training.
Se habla IA

Having prepped millions of words of client data for AI consumption, our team has hit upon various guidelines we call ‘AI grammar’ to reduce and rearrange words in a way that streamlines AI analysis.
If we rephrased “Misty the cat was playful, and her fur was a mix of white, brown and gray. However Max the black cat was mean.” as
CATS
[Misty]
Fur: white brown gray
Playful
[Max]
Fur: black
Mean
…an AI model wouldn’t regard that as awkward, but rather 20% more economical with more consistent structure.
We also remove anything that interrupts descriptions of processes and frameworks (e.g. an anti fraud handbook including an illustrative case study midway through a list of common money laundering schemes) or reiterates public knowledge that an AI model would already possess in its training data (for example, a glossary of everyday terms like “carbon footprint” at the start of a company’s environmental policy.)
Deciding what to condense or remove and what to leave in is as much art as science. The math that AI models apply when analyzing data is too complex for humans to understand on anything but an abstract level. However, once your team has logged sufficient hours working with AI agents and seeing how the format of inputs impacts the quality of outputs, they can start developing a sense for how to make things easier for the machine. And while the difference between 85% accuracy versus 95% accuracy might seem negligible when you’re just asking ChatGPT for help with some ad-hoc research, that difference can be massive if you’re entrusting AI to answer customer service questions or guide an industrial equipment manufacturer’s repair technicians in the field.
Stocking the Shelves

We already discussed how AI models don’t read: it’s also worth pointing out that they don’t “learn”, either. Basically, an AI model ingests massive amounts of data during its development – an energy- and labor-intensive process that can take several months and cost millions to do properly – after which it uses that pool of data to inform its future responses.
A version of ChatGPT with a training cutoff of September 2025 won’t know who won the gold medal for women’s figure skating at the 2026 Winter Olympics unless you keep handing the AI model a list of medalists every time it generates a new message. And the same goes for any recent or organization-specific / non-public data.
While individuals can upload documents into a generic AI chatbot and some platforms (e.g., OpenAI, Claude, Gemini) allow teams to maintain an always-uploaded set of reference documents or connect to their Google Drive / OneDrive accounts, none of this is a solution for making massive volumes of data available to AI on an enterprise scale.
The trick with enterprise-level data management is to set up systems where AI agents can locate relevant and up-to-date organizational knowledge when they need it without flooding their working memory with irrelevant information from somebody’s poorly organized Google Drive account.
Fortunately, this is a problem that’s already been solved by existing enterprise-grade documentation tools. Most large organizations with vast bodies of mission-critical knowledge – major engineering firms, pharmaceutical manufacturers, insurance companies – will use some kind of “Component Content Management System” (CCMS) to handle their documentation. With these systems, if you have a set of insurance regulations that are 80% the same in 100 different jurisdictions, then instead of maintaining and updating 100 manuals for insurance agents you just keep the data broken up into smaller chunks – some generic, some jurisdiction specific – that the CCMS assembles to produce a manual for a particular jurisdiction when needed.
By integrating AI agents with this type of platform (plus your organization’s other sources of truth – ERP, patient records, CMS, etc.) you can let the agent selectively query the most relevant and up to date organizational knowledge in the moment instead of having it search a massive disorganized library Google-style: for instance, an agent helping with chemical factory maintenance might query “What models of centrifugal pumps do we have documentation for?” then, based on the results it might query “What is the outline for the Goulds 3196 centrifugal pump maintenance manual?” then do one more query on the specific section(s) of interest, allowing it to pull up the 18 most relevant pages of documentation versus the full 200 page manual.
AI agents pulling data from various public platforms / databases.
Fast Casual: Can You Do Enterprise Data Management with General Purpose Tools?

At this point, you’d be forgiven for thinking “This sounds complicated and expensive, and we’ve tried and failed to keep our documentation and data organized in the past. Can’t we just connect ChatGPT or Claude to our Google Drive / SharePoint and leave it at that?”
The honest answer is: for individual use cases, maybe. If you’re a solo consultant or small business that wants to upload your client files or past proposals and ask questions about them, the generic AI chatbots can work reasonably well.
But here’s what changes at enterprise scale:
First, volume. When you’re running thousands or tens of thousands of AI interactions daily, small error rates compound quickly. A 5% error rate might be acceptable for personal use, but when you’re deploying AI to support customer service, clinical decisions, or regulatory compliance, that same 5% can represent hundreds of mistakes per week.
Second, consistency. Generic chatbots don’t maintain consistent behavior across your organization. The same question asked by two different employees might get different answers depending on what versions of which files they uploaded, or how many old drafts and irrelevant documents they have sitting in their personal Google Workspace / Microsoft 365 folders.
Third, governance. When something goes wrong (and at scale, something will eventually go wrong) you need to be able to trace exactly what information the AI accessed, why it gave the answer it did, and how to prevent similar errors in the future. The standard AI chatbots aren’t built for that kind of accountability.
Finally, integration. Your organizational knowledge doesn’t live in a handful of PDFs – it lives in your ERP system, your CRM, your quality management system, your training platforms, and dozens of other sources. Making all of that accessible to AI in a reliable, access-controlled way requires infrastructure that goes well beyond uploading files or connecting folders to a shared ChatGPT / Claude projects space.
This doesn’t mean AI is only for enterprises with massive budgets. It means that organizations serious about getting value from AI technology need to think beyond AI as a personal productivity assistant and plan for scale.
Heat and Serve

While AI agents are an important new audience for organizational knowledge, we don’t want to create parallel “AI” and “human” data sources within organizations as that is a recipe for data floating out of sync.
So, what would a unified human / AI knowledge management strategy look like?
It will probably be “AI first” – storing AI optimized content on platforms AI agents can query without humans dragging and dropping documents. Then, if a human ever wanted to read it, an AI agent could “rehydrate” the data, adding in the extra words and illustrative examples that make knowledge useful for humans.
Going back to our previous example, if we had information about cats stored in this format for AI:
CATS
[Misty]
– Fur: white brown gray
– Playful
[Max]
– Fur: black
– Mean
We’d simply need to tell an AI agent “Generate a plain language description of these two cats, emphasizing brevity, no bullet points.” to get:
“Misty is a playful cat with white, brown, and gray fur. Max is a mean black-furred cat.”
This same process could be performed on an enterprise scale, either generating human-readable documentation on demand from an AI-optimized CCMS, or automatically generating a linked human-readable article in the CCMS whenever updates are made to the AI version.
Conclusion
Knowledge management isn’t something that the big AI tech companies like to talk about as they chase the holy grail of “Artificial General Intelligence” (AGI) – a mythical AI model so smart that it knows everything and can do anything right out of the box. The notion that AI models might need extensive data plumbing and traditional workflow scaffolding in order to be useful undercuts that narrative.
But here’s what we’ve learned after years of implementing AI solutions across industries: the unglamorous documentation and integration work isn’t a necessary evil – it’s where the actual value gets created. AI models themselves are growing increasingly commoditized. What’s not commoditized is the expertise to structure organizational knowledge so AI can reliably access it, use it correctly, and deliver consistent results at scale.
Organizations that invest in this foundation today aren’t just ‘better positioned’ – they’re building a genuine competitive advantage. While their competitors are still debugging chatbots that give plausible-sounding wrong answers, they’re deploying AI that actually works: answering customers’ questions accurately, helping technicians troubleshoot faster, supporting clinicians with reliable information, and automating workflows that used to require extensive human oversight.
The question isn’t whether to do this work. The question is whether to do it right the first time, or learn these lessons the expensive way.
At Sonata Intelligence, we’ve spent over a decade helping organizations structure their knowledge for effective learning and performance support – first for human learners, and now for AI agents. If you’re exploring how to make AI work reliably with your organizational knowledge, contact us for a consultation.


Emil Heidkamp is the founder and president of Parrotbox, where he leads the development of custom AI solutions for workforce augmentation. He can be reached at emil.heidkamp@parrotbox.ai.
Weston P. Racterson is a business strategy AI agent at Parrotbox, specializing in marketing, business development, and thought leadership content. Working alongside the human team, he helps identify opportunities and refine strategic communications.”
If your organization is interested in developing AI coaches or other AI-powered training solutions, please reach out to Sonata Learning for a consultation.





