AI Implementation PoC Design Guide — Practical Steps for Thai B2B Companies to Make Go-Live Decisions

AI Implementation PoC Design Guide — Practical Steps for Thai B2B Companies to Make Go-Live Decisions

Lead

A PoC (Proof of Concept) for AI adoption is the process of verifying, on a small scale and within a short timeframe, whether a new AI technology can function in production operations. Many companies conduct PoCs, but cases of so-called "PoC fatigue"—where verification is completed yet the project never reaches production—remain a persistent problem.

This article is intended for DX promotion leads, information systems departments, and corporate planning staff at companies operating B2B businesses in Thailand. It systematically organizes the typical patterns by which PoCs become hollow exercises, then explains a 4-step framework (scope definition, success criteria, architecture, and GO/NO-GO gates) for designing PoCs that lead to production deployment decisions. By the end of the article, readers will be able to identify why their own PoCs are failing to reach production and will have a clear picture of what to include in their next PoC design document.

A PoC is not a venue for confirming whether something "worked or didn't work"—it is a venue for deliberately gathering the information needed to make a production deployment decision. A PoC that launches without this shared premise tends to stall at the review meeting stage and struggle to reach productization. This section clarifies the distinct roles of PoC, pilot, and production operations, and organizes the four typical patterns that cause PoCs to become hollow exercises.

Differences Between PoC, Pilot, and Production

PoC, pilot, and production operations are distinct phases with different purposes, scopes, and evaluation targets. Conflating them while still calling everything a "PoC" creates misaligned expectations and mismatched evaluation criteria.

PhasePrimary PurposeScopeKey Evaluation Targets
PoCValidation of technical hypothesesLimited tasks, limited users, limited dataModel accuracy, processing time, basic user acceptance
PilotValidation of practical usability in business operationsNear-operational conditions within one department or one siteOperational efficiency, operational burden, exception handling, support structure
Production OperationsCompany-wide rollout and ongoing operationEntire target workflow / multiple sitesSLA, cost, governance, continuous improvement structure

Attempting to determine within a PoC whether something can be integrated into business operations causes the verification period to drag on and the scope to balloon. It is more practical to limit the PoC strictly to determining "whether there is sufficient value to proceed toward production," and to reserve the final confirmation of operational fit for the pilot phase.

Four Typical Patterns of Hollow PoC

The typical patterns in which a PoC ends with "it worked" can be broadly grouped into four categories.

  1. Evaluation criteria are not documented in advance — "Getting a feel for it by trying it out" becomes the de facto goal, and the project moves forward without any agreed-upon quantitative or qualitative criteria. Toward the end, debate erupts over whether the outcome can even be called a success.
  2. The scope is not deliberately narrowed — Broad challenges such as "AI-automate all company-wide inquiry handling" are taken on within the PoC, and the entire timeframe is consumed just by data preparation.
  3. The business owner is absent and only the IT department is involved — Even after verification is complete, the business side has no understanding of how their work will change, causing the production deployment proposal to stall during stakeholder alignment.
  4. Production conditions are undefined at the start of the PoC — After the PoC concludes, no one can answer the question "what do we need to do next to move to production?" The project ends with a written report and nothing more.

These are not problems unique to individual teams or sites—they are structural flaws in a design that defines the PoC as nothing more than a "verification phase."

Why PoC Fails to Scale

A PoC that ends with slides showing "it worked" and applause from attendees—this is a scene commonly seen at AI implementation sites in both Thailand and Japan. The fact that it worked is real, but it provides no basis for concluding that the solution will scale to production operations.

The reason it fails to scale is not the technology itself—it lies in the connective tissue between phases.

  • Gap between the verification environment and the production environment — PoCs use clean evaluation data, whereas production involves noise, missing values, and unexpected formats. Accuracy can degrade significantly from the environment change alone.
  • Insufficient representativeness of the data — The 100–500 records used for PoC verification do not represent the tens of thousands of records processed annually in production. Edge cases remain invisible during verification.
  • Operational costs and governance left unconsidered — Elements that receive little focus during a PoC—such as model inference costs, retraining frequency, monitoring structures, log retention, and the handling of personal information in compliance with the PDPA—balloon when production deployment is estimated.
  • Asymmetric expectations among stakeholders — Management expects "immediate company-wide rollout," the IT department knows "the architecture needs to be rebuilt," and the business side feels "this doesn't fit our work"—a misalignment of expectations that never makes it into the report.

Unless these connective points are addressed at the PoC design stage, a successful verification will not translate directly into production deployment.

PoC Design Prerequisites — Preparing for Production

Before deciding what to validate in a PoC, work backwards from the information needed to proceed to production. By designing three elements before the PoC begins—stakeholder alignment, connection points to production operations, and evaluation logic—the validation results can feed directly into the production go/no-go decision.

Designing Stakeholder Alignment

Identify the decision-makers and stakeholders for the PoC at the outset, and document what each party expects from it.

Key stakeholders and their areas of interest in the PoC break down as follows:

  • Executive leadership / Business unit heads: Investment decisions (ROI, competitive benchmarking, risk). They want concise, quantitative summaries.
  • Business divisions / End users: How their work will change. They want hands-on exposure through concrete use cases.
  • IT / Information Systems: Architectural alignment, operational burden, and impact on existing systems. They want technical detail.
  • Legal / Compliance: Data handling, compliance with Thailand's PDPA and Japan's Act on the Protection of Personal Information, and adherence to industry regulations.
  • Finance: Total cost estimates for production, TCO, and transparency of cost structure.

Document these expectations before the PoC begins, then work backwards to determine "who gets shown what, and at what level of detail" at the final readout. Deciding the structure of that readout in advance clarifies what data needs to be collected during the validation.

Defining the Connection Points to Production First

A PoC is not a self-contained experiment—it is the precursor to production operations. When designing the PoC, provisionally define the following connection points.

  • Integration targets — Of the core systems, CRM, groupware, and data infrastructure, which will the production system connect to? Even if the PoC uses an intermediate file-based approach as a shortcut, the direction of the production architecture should be decided in advance.
  • User journey — In production, through which screen or application will users access the AI functionality? Will a dedicated UI be built, or will it be embedded in an existing tool?
  • Data flow — Will inference be real-time or batch? What are the data retention periods and access frequencies?
  • Security and audit requirements — Authentication integration, access control, log retention, and audit trails. When handling business data in Thailand in particular, it will be necessary to design for PDPA consent acquisition and data subject rights.
  • Multilingual requirements — When handling a mix of Thai, Japanese, and English, even if the PoC is scoped to a single language, the cost of multilingual support in production must be estimated.

None of these need to be fully implemented in the PoC. The goal is to maintain a list of "what will be required in production" so that the additional work needed upon a successful PoC outcome is made visible.

Minimum Setup for Evaluation Logic

At a minimum, establish a skeleton for the evaluation logic before the PoC begins. If the evaluation criteria shift midway through, discussions will drift from "the validity of the PoC results" to "the validity of the evaluation criteria."

The four minimum elements to have in place are:

  • Evaluation dataset — Document the number of samples, the time period covered, and who applied the labels. Separate representative samples from challenge samples that intentionally include edge cases.
  • Definition of expected output — Where a single correct answer exists, prepare ground truth data. Where consensus rules exist, prepare a rules document. Where human judgment is required, define the criteria for selecting reviewers.
  • Metrics — In addition to standard machine learning metrics such as precision, recall, and F1, also incorporate business metrics (processing time reduction rate, human intervention rate, user satisfaction).
  • Judgment process — Define when, by whom, and against what criteria the evaluation will be conducted. When multiple evaluators are involved, align on judgment standards in advance.

Deciding on the evaluation logic after the fact creates the temptation to select criteria that favor the results. This is the single greatest factor that undermines the credibility of a PoC.

Step 1: Defining Scope and Business Challenges

Seventy percent of a PoC's success or failure is determined by the initial problem definition. No matter what AI technology is applied to a vaguely defined problem, the outcome will be "it worked, but the value is unclear." In Step 1, define the business problem with high resolution and deliberately narrow the scope, constraining it to a range where a decision can be reached within the PoC timeframe.

Sharpening the Resolution of Business Challenges

Setting business challenges too abstractly — such as "improve sales productivity" — makes it unclear what should be validated in a PoC. The following are typical patterns for increasing resolution:

  • NG: We want to improve sales productivity — The scope of validation is too broad, making it impossible to determine whether AI can improve the relevant work.
  • OK: We want to reduce the pre-document collection phase of quote creation (aggregating past projects, pricing, and inventory) — which currently takes an average of 45 minutes — to under 10 minutes — The target task, current value, and goal are clearly defined, and the degree of improvement is measurable.

To increase resolution, break down the current workflow into individual task units. Understand who does what, when, and how long it takes through one to two weeks of workflow observation or on-site interviews. Observation may lead to the conclusion that redesigning the workflow would be more effective than applying AI. In that case, retain the option to move directly to process reform without conducting a PoC.

Investing time in increasing the resolution of business challenges is the most effective preparatory step for improving the overall success rate of a PoC.

Intentionally Narrowing the Scope

The temptation to expand scope is ever-present. Examples include: "Other departments have requests, so let's validate them together," "Supporting multiple languages would be useful in the future," or "The vendor recommended adding more features." A broad PoC fails to produce results within a short timeframe, causing decision-making to be deferred.

Set the following criteria for narrowing PoC scope:

  • Duration: 6–8 weeks. Anything longer should be called a pilot, not a PoC.
  • Target workflow: Limit to one. If multiple workflows are involved, run them as separate, sequential PoCs.
  • Target users: 5–10 people. This is sufficient as long as representative roles and skill levels are included.
  • Target data: Limit to a defined period, such as the most recent three months. Edge cases should be evaluated separately in a batch.
  • Target language/region: One language, one location. Multi-language and multi-site support should be separated out into the production rollout estimate.

To the objection that "narrowing the scope won't yield a realistic evaluation," the answer is: "Without narrowing the scope, you won't get results that enable a decision." A broad PoC is a deferral of decision-making and tends to become an exercise in avoiding decisions altogether.

Step 2: Designing Success Criteria and KPIs

Document what "success" means before the PoC begins. By reaching prior agreement on both quantitative and qualitative terms — defining what outcomes would lead to considering production rollout, and what results would trigger withdrawal — you can avoid ambiguity in interpreting the results.

Combining Quantitative and Qualitative Criteria

Relying solely on quantitative or qualitative criteria alone is insufficient. In AI-related PoCs, always combine both.

Examples of quantitative criteria

  • Machine learning metrics such as model accuracy, recall, and F1 score
  • Processing time reduction rate (compared to manual work)
  • Human intervention rate (proportion of AI outputs adopted without modification)
  • Cost per unit task (including inference costs and labor costs)

Examples of qualitative criteria

  • User satisfaction (5-point scale with open-ended comments)
  • Fit with the existing workflow (how easily it can be integrated into current operations)
  • Learning cost (time required for users to become proficient)
  • Ease of handling exceptions (whether humans can take over cases the AI struggles with)

Relying on quantitative criteria alone risks overlooking outcomes where "accuracy was achieved but the solution didn't fit the workflow." Relying on qualitative criteria alone risks situations where "users were satisfied but ROI cannot be justified." Presenting both as evaluation inputs ensures that all the information needed for a production rollout decision is in place.

Determine the weighting of each criterion in advance. Document composite decision rules such as: "Even if quantitative targets are not met, a conditional GO may be issued if qualitative scores are exceptionally high."

Writing Failure Criteria First

At the start of a PoC, explicitly define the threshold below which the outcome is a NO-GO. A PoC that documents success criteria but not failure criteria tends to result in deferred judgment when outcomes are inconclusive — with responses such as "let's do a bit more validation" or "we can recover in the next phase."

Examples of failure criteria (these are reference values to be adjusted based on the specific workflow and challenge):

  • Accuracy falls below the minimum acceptable level for the workflow (e.g., 80%)
  • User acceptance rate (the proportion of PoC participants who respond "I would want to use this in production") falls below a certain threshold (e.g., 60%)
  • Per-unit processing cost exceeds that of manual work
  • Production rollout is deemed infeasible from a legal or compliance standpoint (e.g., PDPA consent acquisition, cross-border transfer of personal data, etc.)

These values vary significantly by industry, workflow, and data volume. Rather than fixing them as universal benchmarks, derive them by working backward from your organization's current operations and business value.

Defining failure in advance is not a declaration of defeat — it is a safeguard against unnecessary additional investment. A PoC that can issue a NO-GO decision early ultimately improves the efficiency of the organization's overall annual AI investment.

Step 3: Building Architecture and Evaluation Environment

Provisionally establish the architectural direction with production in mind during the PoC. A complete production system is not required, but the PoC design document should include an architecture diagram detailed enough to answer questions about how the system will behave in production.

Integration Methods with Existing Systems

In a PoC, the extent to which integration with existing systems is implemented becomes a major decision point. Full integration requires significant effort and puts pressure on the PoC timeline, but skipping integration entirely makes it impossible to estimate the cost of moving to production.

There are three realistic options:

  • Full API integration — Closest to the production setup. High effort. Choose this when productionization is considered certain, or when the integration layer itself is the key validation point.
  • Intermediate file method (CSV / JSON) — Files exported from existing systems are processed by AI, and result files are returned. The most commonly adopted practical solution in PoCs.
  • Standalone UI only — Decoupled from existing systems, evaluated through a dedicated UI. Increases the effort required for data preparation, but eliminates the need to adjust the integration layer.

Document both the chosen integration method and the method to be adopted in production as a pair. Explicitly stating the migration path — such as "intermediate files for PoC, API integration for production" — improves the accuracy of productionization estimates.

Creating and Handling Evaluation Data

Evaluation data determines the credibility of a PoC. Evaluating with non-representative data makes it impossible to predict behavior in actual production operation.

There are five key considerations when designing evaluation data:

  • Sample size — A minimum of 100–500 cases; 1,000 or more for complex tasks. Too few cases make it impossible to conduct a statistically meaningful evaluation.
  • Representativeness — Include a balanced mix of typical high-volume business cases, seasonal variations, regional and departmental differences, and actual data from the past six months to one year.
  • Edge cases — Intentionally include noisy data, missing data, unexpected formats, and adversarial inputs.
  • Handling of personal information — When using production data as-is, apply masking or pseudonymization to comply with requirements such as Thailand's PDPA and Japan's Act on the Protection of Personal Information. Confirm in advance whether re-obtaining consent is necessary.
  • Dataset version control — Maintain a record of who created the evaluation data, when, where it was sourced from, and its change history. This prevents situations in the productionization phase where "the PoC results cannot be reproduced."

Allocate at least one to two weeks to designing the evaluation data. Any PoC that skips this step will inevitably encounter disputes when interpreting the results.

Step 4: Designing the Production Readiness Gate

Design a "gate" to bring the PoC to a conclusion. By deciding in advance the criteria for GO / NO-GO / Conditional GO determinations and who the decision-makers are, the review meeting can arrive at a concrete decision rather than ending in open-ended discussion.

GO/NO-GO Decision Criteria

Share a table of items to be evaluated at the productionization decision gate at the start of the PoC. The final determination is made by completing this table.

Evaluation AxisWhat to ConfirmDecision-Maker
Quantitative resultsWere the pre-agreed KPIs achieved?Business owner + IT department
Qualitative resultsDid user satisfaction and business fit exceed the threshold?Business owner
Productionization costAnnual cost estimate including inference fees, operational labor, and additional developmentFinance + IT department
Risk assessmentRegulatory risks including legal, operational, security, and PDPALegal + IT department
Alternative comparisonFeasibility of alternatives such as other vendor products, different technologies, or process improvement aloneManagement + Business unit
Company-wide deployment potentialChanges required when scaling from one department to the entire organizationIT department + Business owner

Each item is rated on a five-point scale, and the GO / Conditional GO / NO-GO determination is made based on the total score and a minimum threshold for each item (e.g., Risk must score 3 or above). In the case of a Conditional GO, the additional verification items required must be explicitly stated.

Exit Criteria for NO-GO Decisions

NO-GO is not "failure" — it is a "decision." By designing the withdrawal process in advance for when a NO-GO outcome is reached, an organization can extract value from its PoC investment.

There are four tasks that must be carried out upon withdrawal:

  • Accumulating learnings — Document why it did not work across four axes: data, technology, operations, and organization. Put it in a form that can be reused in the next PoC or when another team attempts the same challenge.
  • Presenting the next approach — Organize alternatives, including non-AI options, such as: "Can this be substituted with manual process improvement?", "Should we consider a different technology (a higher-accuracy dedicated model, rule-based approach, or a different vendor)?", or "Should we revisit the business process itself?"
  • Clearly terminating the investment — Establish mechanisms to prevent zombie PoCs, such as "we will not re-run a PoC on the same theme in six months" or "the next PoC on this topic will be handled by a different team."
  • A failure-value review meeting — Hold a session attended by senior management and relevant departments to share the rationale behind the NO-GO decision. Make the organization's culture of not penalizing failure visible.

Organizations that have a PoC design capable of producing NO-GO decisions will see improved efficiency across their entire annual AI investment portfolio.

Common PoC Failures and Countermeasures

The following is an organized overview of failures commonly seen in PoCs, paired with countermeasures. Use this as a checklist during the design phase.

  1. Proceeding on the assumption that "if we have data, we can solve it" — Data quality, representativeness, and the effort required for labeling are underestimated, and the entire verification period ends up being consumed by data preparation. Countermeasure: Allocate a separate data preparation period of 2–4 weeks before the PoC begins, and exclude it from the PoC timeline.
  2. Leaving everything to the vendor with no involvement from the business side — The PoC built by the vendor works technically, but the business side feels "this isn't our work," and consensus-building stalls when productionization is proposed. Countermeasure: Formally include the business owner as a member of the PoC team and ensure they speak at weekly review meetings.
  3. Calculating productionization costs only at the end — After the PoC is complete, obtaining a productionization estimate results in "it's more expensive than expected," sending the project back to square one. Countermeasure: Estimate rough productionization costs during PoC design, and revise the plan at that point if ROI cannot be justified.
  4. Evaluation data that diverges from production — Accuracy is achieved with the curated validation data, but degrades significantly when production data is introduced. Countermeasure: Source at least 30% of evaluation data from masked production data to ensure production reproducibility.
  5. Decision-makers absent from the PoC review meeting — Members with decision-making authority do not attend the review meeting, and the decision is deferred to the next meeting. Countermeasure: Confirm the date of the review meeting and mandatory attendees at the PoC kickoff.
  6. Successful PoCs that never move forward — The review meeting ends with applause, but no productionization plan materializes and time passes. Countermeasure: Set a maximum of 4 weeks from PoC completion to the drafting of a productionization plan, and designate the responsible drafter in advance.

FAQ

Q1: What is an appropriate duration for a PoC?

The PoC itself should take approximately 6–8 weeks. Factoring in an additional 2–4 weeks for data preparation and 2–4 weeks for drafting the productionization plan, the overall cycle comes to around 3–4 months. Extending the timeline makes it easy to defer decisions, while too short a period leaves insufficient time to gather evaluation data.

Q2: Is there a general guideline for PoC budget size?

Because costs vary significantly depending on the scope of operations, vendor involvement, and the depth of integration with existing systems, no fixed benchmark can be provided. As a decision-making guideline, many companies cap PoC investment at approximately 5–15% of the projected annual investment for productionization. If PoC investment is disproportionately large relative to the productionization outlook, the scope should be revisited.

Q3: Should the PoC be conducted in-house or outsourced to an external partner?

A practical division of responsibilities is to keep the business owner, data preparation, and evaluation judgment in-house, while outsourcing technical implementation to an external partner. Handing off technical implementation entirely results in additional costs during the productionization phase for building in-house support and maintenance capabilities.

Q4: How long does it take to move from a successful PoC to productionization?

While this varies by business and organizational scale, it is common for the process from plan drafting to production release to take anywhere from 3 months to 1 year. The timeline varies significantly depending on the additional effort required for architecture decisions, data preparation, and operational structure building that become apparent during the PoC.

Q5: When conducting a PoC in Thailand, what issues require particular attention?

Compliance with Thailand's Personal Data Protection Act (PDPA) should be confirmed from the PoC stage. Specifically, this includes the consent status of personal information contained in evaluation data, the legality of cross-border data transfers (when using AI services outside Thailand), and the processes for responding to data subject rights (access and deletion requests). In business environments where Thai, English, and Japanese are mixed, multilingual performance evaluation of the model should also be added.

Summary

By designing an AI implementation PoC as a venue for "gathering the information needed to make a productionization decision" within the verification period, organizations can avoid it becoming a mere formality. The four steps explained in this article are as follows:

  • Step 1: Define scope and business challenges — Break down business challenges to the task level, and deliberately narrow the PoC scope to a 6–8 week timeline, one target operation, and 5–10 users.
  • Step 2: Design success criteria and KPIs — Set both quantitative and qualitative measures in tandem, and document failure criteria before the PoC begins.
  • Step 3: Build the architecture and evaluation environment — Provisionally establish the integration method with existing systems, the representativeness of evaluation data, and regulatory compliance including Thailand's PDPA, and clearly define the migration path to productionization.
  • Step 4: Design the productionization decision gate — Maintain a table of decision criteria for GO / Conditional GO / NO-GO, and run the review meeting as a venue for decision-making.

The quality of PoC design directly affects the probability of advancing to productionization. Before beginning verification, review the design document and confirm that none of the checklist items in this article have been overlooked.

For related reading, see Scaling AI Agents to Production Operations on scaling to live operations, How to Measure the Impact of AI Agent Implementation on practical approaches to impact measurement, and AI × Synthetic Testing on evaluation design using synthetic data.

Author & Supervisor

Yusuke Ishihara

Yusuke Ishihara

Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).