What Is an AI Data Readiness Audit? How to Assess Your Internal Data Before Implementing Agentic AI

What Is an AI Data Readiness Audit? How to Assess Your Internal Data Before Implementing Agentic AI

AI Data Readiness Audit: A systematic process for evaluating internal data quality and preparedness to safely and reliably deploy agentic AI. This article explains concrete audit procedures and passing criteria for IT departments and corporate planning staff considering AI adoption, and introduces methods to eliminate implementation failure risks in advance.

An AI data readiness audit is the process of systematically evaluating the quality, accessibility, and structure of internal data in order to safely and reliably deploy agentic AI.

Many AI implementation failures are attributed not to model performance, but to data-side issues. Incomplete data lineage, fields whose definitions vary across departments, confidential information with inadequate access controls—a "data readiness audit" is the mechanism for identifying and correcting these problems before they arise.

This article is primarily intended for IT department staff and corporate planning professionals, and covers the following:

  • The 4 steps of an audit (inventory, quality assessment, compliance verification, and scoring)
  • Specific checklist items to review at each step
  • How to translate audit findings into an improvement roadmap

By the end of this article, you will have a clear basis for determining whether your organization's data is in a state that can withstand agentic AI deployment.

A company that can say "we are ready to adopt AI" and a company that can say "our data is in a state where AI can actually use it" are, in reality, two different things.

An AI data readiness audit refers to the process of systematically evaluating whether the data that forms the foundation of agentic AI is truly usable before deploying it within an organization. It is, in essence, the work of re-examining the health of the data itself—prior to selecting any tools or models.

While the term "AI-ready" is often used in the context of meeting infrastructure and security requirements, data readiness goes a step further, examining data quality, consistency, accessibility, and governance frameworks as well. Proceeding with implementation while leaving this distinction unclear tends to result in situations where model accuracy falls short, or reliability issues surface the moment the system goes into operation.

The Difference Between AI-Ready and Data Readiness

"AI-ready" and "data readiness" are easily confused, but they refer to different layers.

AI-ready is a concept indicating whether an organization as a whole is prepared to adopt AI. It refers to a broad state of readiness that encompasses non-technical factors such as talent, budget, governance frameworks, and executive commitment.

Data readiness, on the other hand, is one component within that broader concept—an evaluation focused specifically on the quality, state of preparation, and accessibility of the data that AI models will actually use.

It is tempting to think at first that "once we introduce an AI tool, we can clean up the data later," but in practice, no matter how high-performing a model you select, you will not achieve the expected accuracy unless data preparation comes first. Even in organizations that have declared themselves AI-ready, it is not uncommon for PoC (proof of concept) efforts to stall due to insufficient data readiness.

The distinction between the two can be summarized as follows:

  • AI-ready: The overall environment supporting AI adoption, including organizational structure, talent, governance, and budget
  • Data readiness: The data-specific state of preparation, including data quality, completeness, accessibility, and security classification
  • Relationship: Data readiness is a prerequisite for AI-readiness (the broader concept ⊃ the narrower concept)

This distinction is especially important when deploying agentic AI. Because agents autonomously reference multiple data sources while executing tasks, data inconsistencies and gaps can trigger cascading reasoning errors.

Why Agentic AI Demands Strict Data Quality

Compared to traditional rule-based automation, agentic AI is far more sensitive to missing or contradictory data. The reason is that agentic AI does not perform a single operation in isolation—it is predicated on multi-step reasoning, in which multiple steps are autonomously chained together.

If the data ingested at the first step contains an error, that error propagates to every subsequent decision. A human might notice something is "off" partway through, but because agentic AI continues processing according to the procedures defined in the system prompt, its opportunities for self-correction are limited.

The main reasons agentic AI is so demanding of data quality are as follows:

  • Context window constraints: There is an upper limit on the amount of data that can be referenced at once, so when unnecessary, duplicate, or contradictory data is mixed in, useful information gets pushed out
  • Compatibility with RAG: When using RAG (Retrieval-Augmented Generation), retrieval accuracy is directly tied to data normalization, freshness, and consistency. If data is outdated or contains many inconsistencies in notation, the vector database is likely to return irrelevant search results
  • Chained tool calls: When calling external APIs or databases, schema mismatches or a high volume of NULL values can cause processing to halt with errors, causing the entire task to fail

The level of response required also varies depending on how well-prepared the data is. Simple search and summarization tasks can function even with somewhat lower data quality, but tasks involving business judgment—such as order processing or inventory management—require data that satisfies all three criteria of completeness, accuracy, and freshness.

Typical Failures That Occur Without an Audit

"We have the data. All that's left is choosing a model"—it is not uncommon for PoC (proof of concept) efforts to collapse after this judgment is made on the ground.

The failures that tend to occur when an audit is skipped can be grouped into three main patterns.

① Frequent hallucinations When the training or reference data contains gaps or contradictions, agentic AI will output incorrect information with confidence. Because this is a data problem rather than a model problem, attempting to address it through prompt engineering does not resolve the root cause.

② Degraded RAG retrieval accuracy When RAG (Retrieval-Augmented Generation) is adopted, the quality of the documents stored in the vector database directly determines the accuracy of responses. When outdated versions of manuals or duplicate files are mixed in, semantic search tends to return irrelevant documents at the top of results, causing the agent to make incorrect judgments.

③ Compliance violations discovered after the fact When personal information or confidential information is incorporated into training data or search indexes without being properly classified, the risk of violating regulations such as the PDPA (Personal Data Protection Act) arises. If discovered after the system has gone live, it may necessitate a complete shutdown and redesign of the entire system.

What these failures have in common is the false premise that "data problems can be fixed later." In practice, however, attempting to correct data quality issues after model selection tends to multiply the cost and effort to several times that of an initial audit.

The longer the audit is deferred, the higher the risk of rework.

What Should You Prepare Before Starting an Audit?

Conclusion: The success or failure of an audit is determined by the quality of preparation. It is essential to solidify three elements—scope, structure, and data catalog—before getting started.

Starting an audit haphazardly tends to stall progress due to ambiguous scope and the absence of accountable owners. By sequentially working through three preparatory steps—assigning stakeholders, assessing the current state of the data governance structure, and confirming the readiness of the data catalog—the overall accuracy and efficiency of the audit can be greatly improved.

Stakeholder Assignment and Scope Definition

It is easy to assume that an audit can be "handled entirely by the IT department," but in practice, proceeding with a cross-functional team that includes business units, legal, and security can significantly reduce rework in later stages.

There are 4 roles that must be assigned.

  • Data Owner (Business Unit): The person responsible for understanding the definition and intended use of each data source
  • IT / Infrastructure Lead: The person responsible for understanding the actual state of system architecture, access permissions, and integration APIs
  • Legal / Compliance Lead: The person responsible for assessing alignment with the Act on the Protection of Personal Information and internal policies
  • AI Project Owner: The decision-maker who can determine the target KPIs and priorities for the agentic AI

Next, it is important to clearly narrow the scope. Targeting "all internal data" tends to prolong the audit, and there are cases where PoC opportunities are missed without ever reaching a conclusion. Start by limiting the scope to 1–2 "agent use cases to run first," and define only the data sources those use cases depend on as the initial scope.

Items to confirm when defining scope

  • The types and frequency of input/output data for the target use cases
  • The range of systems to be used (ERP, CRM, document management, etc.)
  • Agreement on the audit period and final deliverables (report format and approval workflow)

Once the scope is established, sharing the message at the kickoff meeting with all stakeholders that "the purpose of the audit is not to criticize data, but to determine improvement priorities" will make it easier to gain cross-departmental cooperation.

Assessing the Current State of Data Governance

Whether or not a Data Governance framework is in place has a significant impact on the difficulty and duration of the audit. When a framework is established, inquiries to data owners proceed quickly; when it is not, simply confirming "who manages this data" can take several weeks.

The key items to confirm before starting the audit are as follows.

  • Data Owner assignment status: Whether a responsible department and individual are clearly designated for each dataset
  • Documentation of data policies: Whether documents exist covering the scope of data use, retention periods, and deletion rules
  • Change management process: Whether an approval workflow is functioning when schema changes or data migrations occur
  • Access permission inventory: Whether a periodic review is conducted of who has read/write access to which data

The appropriate response will vary depending on the maturity of the framework. If data owners and approval workflows are formally documented, the audit can proceed as planned; however, if management relies solely on informal, individual-dependent practices, it will be necessary to provisionally assign ownership before proceeding with the audit.

Furthermore, a lack of data governance is directly linked to risks that emerge after agentic AI is introduced. Granting an agent data access while permission management remains ambiguous can become a breeding ground for unintended information access and Excessive Agency.

Record the results of the current-state assessment using a three-tier classification—"framework in place / partial / none"—and use this alongside the findings from the next step on data catalog readiness to determine priorities.

Identifying Whether a Data Catalog Exists and Its Maturity Level

"There should be a data catalog somewhere, but no one knows when it was last updated"—this situation is frequently encountered in audit settings. Because the existence and level of readiness of a data catalog is directly tied to audit scope definition, it is an item that must be confirmed at the very outset.

First, it is important to ask not whether a catalog "exists," but whether it is "actually functioning." Evaluate the level of readiness from the following perspectives.

  • Existence check: Whether an official data catalog tool (Alation, Collibra, Apache Atlas, etc.) or a spreadsheet-based alternative is in place
  • Comprehensiveness: Whether major data sources, including ERP (Enterprise Resource Planning) and core business systems, are registered
  • Freshness: Whether the last update date is within the past 3–6 months. An outdated catalog can actually be dangerous, as it may generate misplaced trust
  • Depth of metadata: Whether information beyond table names and column definitions is recorded, including data owners, update frequency, and usage restrictions
  • Access permission records: Whether it is clearly documented who can access which data

Classifying the level of readiness into the following 3 tiers will be useful when creating the roadmap for subsequent steps.

Step 1: How to Conduct a Data Source Inventory

Without understanding the location, format, and dependencies of data scattered throughout the organization, neither quality assessment nor security review can begin. This is the most critical step—the foundation of the audit. The starting point is a comprehensive inventory that covers not only ERP and business systems, but also the unofficial data generated by shadow AI operating independently within individual departments.

Creating a Data Inventory from Internal Systems (ERP, etc.)

When beginning a data source inventory, many people in charge first try to list "what systems exist." In practice, however, identifying "which tables and fields will be passed to the AI" before simply enumerating system names can significantly reduce rework in later stages.

Basic Steps for Inventory

  • List operational systems by department, including ERP (Enterprise Resource Planning), CRM, inventory management, accounting systems, and others
  • For each system, record in a single line: "data type, update frequency, responsible department, and whether API connection is possible"
  • If the same entity (e.g., customer ID) exists across multiple systems, clearly identify which system serves as the master

Why Start with ERP

Because ERP centrally manages core data such as purchase orders, inventory, and financials, much of the data referenced by AI agents originates here. By first obtaining the ERP's module structure and table definition documents and cross-referencing them against the data items required by the agent, the priorities for the inventory become clear.

Tracking Data Lineage and Visualizing Dependencies

Tracking Data Lineage is the process of visualizing as a single line "where data is born, what it passes through, and where it is used." Because agentic AI autonomously chains tasks together, changes in upstream data directly propagate to downstream inference results. Running AI without understanding these dependencies carries the risk of unexplained misjudgments occurring repeatedly.

The main perspectives to track are as follows:

  • Source system: Identify the system where data is first recorded, such as ERP (Enterprise Resource Planning), CRM, sensor logs, etc.
  • Transformation steps: Enumerate the points where data is transformed, such as ETL processing, API integration, and manual processing
  • Consumers: Record where data is read, such as dashboards, ML pipelines, and vector databases for RAG (Retrieval-Augmented Generation)

The appropriate response strategy varies depending on the depth of dependencies. For a simple structure with a single upstream source, a spreadsheet-based ledger may be sufficient for management. On the other hand, for complex structures that pass through multiple intermediate tables or external APIs, introducing automated lineage tracking using a Data Catalog tool should be considered.

When visualizing, it is also recommended to simultaneously identify "single points of failure." It is not uncommon for a specific batch process to stop, causing multiple AI tasks to halt in a chain reaction.

Discovering Informal Data Created by Shadow AI

"The inventory is done, but somehow the data that people on the ground are actually using isn't on the list" — this situation is rapidly becoming more common with the spread of Shadow AI.

Shadow AI refers to AI tools and automation scripts that individuals or departments use independently, outside the management of the IT department. The forms vary widely: spreadsheet macros, personally contracted cloud AI services, scraping data shared within a department, and more. The data produced by these tools tends not to be registered in the official data catalog and to be in a state where Data Lineage cannot be tracked.

When agentic AI references such unofficial data, problems occur simultaneously at multiple layers. First, data whose updater and update time are unknown becomes the basis for the agent's judgments, causing quality assurance mechanisms to become a mere formality. Furthermore, the coexistence of official and unofficial data causes consistency checks to stop functioning, and duplications and contradictions quietly accumulate. And the risk that tends to be overlooked is the security risk — the possibility that personal information or confidential information has already leaked to unmanaged services cannot be ruled out.

So how does one discover such unofficial data? The first effective approach is direct interviews with on-site staff. The single question, "Where do you get your data in your day-to-day work?" can surface data sources that do not exist in the official catalog one after another. Another clue is network log analysis: by checking access history to external AI services, it is possible to understand the usage status of unauthorized tools.

Step 2: How to Evaluate Data Quality

Data sources identified through the inventory are evaluated from multiple angles for quality. The four fundamental axes are completeness, accuracy, consistency, and timeliness. After checking each of these in turn, suitability for RAG and vector databases, as well as the feasibility of utilizing synthetic data, are also examined.

Four-Axis Check: Completeness, Accuracy, Consistency, and Freshness

In data quality evaluation, it is easy to think "just checking the record count will do," but in practice, even if the number of records is large, if the contents contain missing values or contradictions, the judgment accuracy of the AI agent will be significantly impaired. Systematically checking across four axes is the most direct way to prevent rework after implementation.

Completeness Measure the missing rate of required fields.

  • It is not uncommon for fields such as "industry code" or "person-in-charge ID" in a customer master to be left blank
  • Fields with high missing rates directly affect RAG (Retrieval-Augmented Generation) search accuracy and therefore require priority remediation

Accuracy Verify that records match actual conditions.

  • If the inventory count in ERP (Enterprise Resource Planning) diverges from the actual stock in the warehouse, it becomes a cause of demand forecasting AI issuing incorrect order proposals
  • This can be detected by cross-referencing against external master data or reference data

Consistency Verify that the same entity holds the same value across multiple systems.

  • If the naming conventions for customer codes differ between sales management and accounting, the agent will treat the same customer as separate entities, causing errors in aggregated results
  • Tracking Data Lineage and identifying discrepancies in transformation rules is an effective approach

Timeliness Evaluate whether data is updated at a time when it can be used for decision-making.

Validating Compatibility with RAG and Vector Databases

In agentic AI systems that leverage RAG (Retrieval-Augmented Generation) and vector databases, the "retrieval suitability" of text data directly determines the quality of responses. Even if general quality metrics such as completeness and accuracy are satisfied, if RAG-specific requirements are not met, the agent risks continuously referencing incorrect documents.

Key Points to Validate

  • Appropriateness of chunk size: If the unit used to split documents is too large, it strains the context window; if too small, it causes semantic fragmentation. Using approximately 500–1,000 tokens as a baseline, adjustments should be made for each document type.
  • Embedding consistency: If there are variations in notation for the same concept (e.g., "Customer ID" vs. "Client Number"), the accuracy of semantic search degrades. The state of terminology standardization should be confirmed in advance.
  • Metadata assignment status: If metadata such as document creation date, department, or version is missing, errors can easily occur where outdated information is referenced as the latest.
  • Proportion of duplicate or conflicting documents: When multiple versions of the same document exist, similarity scores become dispersed within the vector database, lowering the priority ranking of the correct document.

Decision Criteria Based on Conditional Branching

If internal documents consist of structured manuals, a hybrid search combining chunk splitting with BM25 tends to be a good fit.

Criteria for Supplementing Quality with Synthetic Data

When a data quality issue is discovered, immediately concluding that "synthetic data can fill the gap" is a common misjudgment seen in practice. Synthetic data is not a universal solution, and it is important to distinguish between situations where it should and should not be used.

Synthetic data is effective when training or test samples are statistically insufficient, as a substitute when real data containing personal information cannot be used directly, and when you want to intentionally increase the volume of anomalies or rare patterns that seldom occur in actual operations.

On the other hand, if synthetic data is generated without a clear understanding of the real data's distribution, there is a risk that the model will learn biased patterns that diverge from reality. Business data extracted from ERP systems in particular carries industry-specific correlations, and substituting it with synthetic data may reduce RAG retrieval accuracy.

When making this judgment, first confirm whether the statistical characteristics of the real data—mean, variance, and distribution shape—are understood. If these are unknown, you should not proceed with generating synthetic data. Next, you must also verify whether a validation process is in place to quantitatively measure the statistical divergence between synthetic and real data after generation.

Step 3: How to Verify Security and Compliance

Conclusion: Even if data quality is in order, skipping security and compliance verification will directly expose an AI implementation to legal risk.

We systematically check the safety of internal data from three perspectives: classification of personal information, assessment of data poisoning risks, and verification of alignment with AI governance policies.

Classifying Personal and Confidential Information and Confirming PDPA Compliance

What tends to be overlooked in security audits is the classification work itself—determining "which data requires protection." It is easy to initially assume that "personal information is only in the customer table," but in reality, many cases have been reported where personally identifiable information is also mixed into employee activity logs, inquiry histories, and even transaction data in ERP (Enterprise Resource Planning) systems.

Before granting an AI agent access to data, the following classifications must first be completed.

  • Public information: Externally published catalogs, specifications, etc.
  • Internal-only information: Internal business data, meeting minutes, approval documents, etc.
  • Confidential information: Trade secrets, financial forecasts, contract terms, etc.
  • Personal information (requiring protection): Names, contact details, purchase history, biometric data, etc.

For businesses operating in Thailand, confirming compliance with the PDPA (Personal Data Protection Act) is mandatory. The law requires that the purpose of collecting and using personal data be clearly stated, and that the legal basis for processing (consent, legitimate interest, etc.) be documented. In configurations where an AI agent autonomously references and processes data, failure to capture logs of "who accessed what data, when, and for what purpose" creates compliance risks.

The following are the audit checkpoints to keep in mind.

Assessing Data Model Poisoning Risks

Data/Model Poisoning is the risk that malicious or erroneous information is introduced into training data or the vector databases used as retrieval targets in RAG, intentionally or unintentionally distorting the output of an AI agent. Agentic AI systems are particularly dangerous in this regard because they autonomously and repeatedly make tool calls and retrieve external data, meaning that once-contaminated data can spread its impact in a chain reaction.

During evaluation, confirm the following points in order.

  • Mapping data input pathways: Enumerate the pathways through which unmanaged data—such as external APIs, user inputs, and web scraping—can be introduced into internal vector databases or feature stores.
  • Auditing update and write permissions: Confirm who and which systems are authorized to update training data or RAG indexes, and verify that permissions are minimized.
  • Tracking data lineage: Trace the transformation history from the data's origin to its current storage location, and verify whether any unauthorized changes have been recorded.
  • Detecting anomalies and statistical deviations: Score patterns that are likely to be indicators of poisoning, such as label imbalances, sudden distribution shifts, and concentrations of duplicate records.

As a decision criterion, if data sources are limited to a closed, internally managed environment, prioritizing the strengthening of internal access controls is recommended; if external data or synthetic data generated by an LLM is being ingested, establishing a content validation pipeline should take precedence.

Checking Alignment with AI Governance Policies

Even when data security measures are in place, many teams have not verified whether their AI governance policies actually align with how data is being used.

An AI governance policy alignment check should cover the following points:

  • Explicit statement of purpose: Whether each dataset's intended AI use case is clearly defined and approved under policy
  • Compliance with data retention and deletion rules: Whether data exceeding the retention period defined in policy has not been incorporated into agent training or reference
  • Licensing for third-party data: Whether externally sourced data includes license terms that prohibit its use in AI
  • Input restrictions for models: Whether the system is designed to prevent highly confidential data from flowing unrestricted into an agent's context window

In practice, the alignment check involves cross-referencing data flows against policy documents such as those described in What is AI Governance? A Practical Guide from EU AI Act Compliance to Internal Rule Development, as the central point of reference.

Findings are recorded in three categories — "Compliant," "Requires Revision," and "Not Permitted for Use" — and carried forward into the scoring process in Step 4. If policies are not yet in place, a practical approach is to proceed with drafting them in parallel with the audit.

Step 4: How to Score Audit Results

Conclusion: Only by converting the evaluation data collected during the audit into quantitative scores and defining clear pass/fail thresholds can improvement priorities be properly established.

This section explains how to translate the quality, security, and governance evaluation results identified in Steps 1–3 into a scoring framework and priority matrix. Quantifying the results makes it significantly easier to communicate with senior management and develop an improvement roadmap.

Calculating the Readiness Score and Determining Pass/Fail Thresholds

When audit findings are summarized based on gut feeling rather than data, reaching cross-departmental consensus becomes difficult and decision-making tends to stall. Quantifying scores through a scoring framework makes it far smoother to brief senior management and discuss improvement priorities.

Score Calculation Procedure

The practical approach is to assign scores for each evaluation dimension and calculate a weighted average for an overall score. The following five dimensions are recommended as a standard set.

Evaluation DimensionWeighting (out of 100)
Completeness & Accuracy25 points
Consistency & Freshness20 points
Accessibility & Structuredness20 points
Security & Compliance20 points
Data Governance Framework15 points

Each dimension is rated on a three-level scale — "0: Not in place," "1: Partially in place," "2: Fully in place" — and multiplied by the respective weighting before being summed.

Pass/Fail Threshold Guidelines

It is tempting to set a uniform threshold such as "a total score of 70 or above qualifies for deployment," but in practice it is more effective to adjust the standard based on the complexity of the use case. For a simple FAQ chatbot, a PoC can begin even with a score in the 60s, whereas for scenarios in which an agentic AI makes autonomous decisions across multiple systems, a score of 80 or above is advisable.

  • Below 60: Defer deployment. Prioritize rebuilding the data foundation.
  • 60–79: Begin a PoC within a limited scope while pursuing data improvements in parallel.
  • 80 and above: Ready to proceed to the full deployment phase.

Important Note

Scores are, above all, a relative indicator.

Building an Improvement Roadmap Using a Priority Matrix

Once scoring is complete, the next critical step is determining the order in which issues should be addressed. Attempting to resolve all problems simultaneously risks spreading resources too thin and stalling the overall project.

In the priority matrix, each issue is plotted along two axes: impact (contribution to AI accuracy and stability) and remediation cost (effort, expense, and technical complexity).

  • High impact / Low cost (Immediate action): Filling in missing values, removing duplicate records, etc. Begin within a 1–2 week sprint.
  • High impact / High cost (Planned action): ERP (Enterprise Resource Planning) integration, data lineage development, etc. Target completion before the start of the PoC (Proof of Concept).
  • Low impact / Low cost (Address when capacity allows): Correcting inconsistent metadata notation, etc.
  • Low impact / High cost (Defer or exclude): Full overhaul of legacy systems, etc.

When there are many low-cost issues, they can be resolved sequentially through 2–4 week sprints. However, when structural changes to core systems are involved, it is more realistic to establish a 3–6 month phase and proceed incrementally.

By specifying achievement criteria for each phase, the improvement roadmap makes progress visible. For example, setting a quantitative gate such as "the data completeness score meets or exceeds the benchmark upon completion of Phase 1" makes it easier to build consensus among stakeholders.

Once the audit findings and improvement roadmap are in place, the groundwork is laid to move forward to the next steps toward full deployment of agentic AI.

Author & Supervisor

Yusuke Ishihara

Yusuke Ishihara

Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).