Here's a conversation I keep having: a CTO tells me their team built an AI proof of concept six months ago. It works. The demo impresses people. They've been "working on getting it to production" ever since. I ask what's blocking it. The answer is always some version of "we need to figure out the infrastructure" or "we're still deciding on the monitoring approach."
The model isn't the problem. The problem is everything around the model that nobody thought about until the demo was done.
The Pattern
It goes like this. Someone in leadership reads about AI. They want in. A small team gets two weeks to build a proof of concept. They pick a model, connect it to some data, build a clean interface, demo it to the board. Everyone nods. "Ship it."
Then nothing happens for months. Because shipping an AI feature isn't the same as shipping a CRUD endpoint. The questions that come up after the demo are hard ones: how do we handle model updates without downtime? What happens when the model gives a wrong answer — who's responsible? How do we measure whether it's actually working in production? Who maintains the training data?
These aren't technical questions. They're organisational questions. And they weren't part of the pilot scope because the pilot was scoped to "prove the technology works," not "prove the organisation can support it."
What's Actually Blocking You
From the engagements I've been part of, three things kill AI pilots before they reach production:
- No success metric defined before the pilot started. The team proved the model could process the data. They didn't prove it could do so at a cost, speed, and accuracy level that makes business sense. Without a clear "this is good enough to ship" threshold, the pilot becomes an infinite optimisation loop.
- Model picked before use case defined. The conversation started with "we should use GPT-4" or "we need a RAG pipeline," not "we need to reduce the time our support team spends classifying tickets by 40%." The tool was the starting point, not the problem. This leads to features that are technically impressive but solve something nobody was actually struggling with.
- Nobody owns the integration. The data science team built the model. The product team defined the feature. Engineering needs to integrate it. But engineering wasn't in the room when the architecture decisions were made, and now the model expects data in a format the production system doesn't produce. The handoff is where pilots die.
How to Structure an AI Initiative That Ships
I've seen this work best when the AI initiative is treated as an engineering project, not a research project. The distinction matters:
Research mode: "Can we build something with this model?" The answer is almost always yes. This is the wrong question.
Engineering mode: "If we solve [specific problem] for [specific users], we expect [measurable outcome]. We'll build the smallest version that tests this, and if it works, we already know how to deploy it."
The practical difference:
- Define the business metric first. Not "accuracy" or "F1 score." A business metric: time saved per ticket, reduction in manual data entry hours, increase in conversion rate. If you can't state the metric, you can't ship to it.
- Scope the integration from day one. The pilot should run against production-adjacent infrastructure, not a Jupyter notebook on someone's laptop. If the model needs real-time data, build the data pipeline as part of the pilot, not after.
- Assign an engineering owner before the pilot starts. Not a data scientist. An engineer who will be responsible for deploying and maintaining the feature in production. They don't need to build the model, but they need to understand what it needs to run.
- Set a ship-or-kill date. Four to six weeks. If the pilot can't demonstrate the business metric in that time, the use case is wrong or the approach is wrong. Either way, stop and reassess. Perpetual pilots burn credibility with the wider team.
The most successful AI features I've seen shipped weren't the most technically sophisticated. They were the ones where the team knew exactly what "done" looked like before they started building.
The Build vs Buy Question
Not everything needs a custom model. Most internal AI use cases — document search, email classification, data extraction, summarisation — can be solved with existing APIs and some prompt engineering. The custom model approach makes sense when you have proprietary data that gives you a real edge, or when the domain is specialised enough that general models perform poorly.
The mistake is treating "build vs buy" as a one-time decision. Start with the fastest path to a working feature (usually an API). If it hits the business metric, ship it. If it doesn't, the gap between what the API provides and what you need tells you exactly what a custom model has to solve. Now you're building a model to fill a specific gap, not building a model and hoping it's useful.
Where Advisory Fits
The reason these pilots stall isn't a lack of AI talent. It's a gap between the AI team and the engineering organisation. The data scientist knows transformers. The engineering team knows the production system. Nobody is translating between them.
That's where external advisory helps. Not to build the model — the team can do that. But to structure the initiative so the model actually ships: define the right metrics, scope the integration early, set the ship-or-kill timeline, and make sure the handoff between research and engineering doesn't lose three months to miscommunication.
CoolMinds advises engineering teams on AI adoption — not hype, but structured initiatives that reach production. If your AI pilot has been "almost ready" for more than two months, let's talk about what's actually blocking it.
Start a conversation