Listen now on YouTube | Spotify | Apple Podcasts
Most AI agent pilots never make it to production
CIOs are burning millions on AI agent pilots that die in the sandbox. This isn’t unexpected; it’s a predictable pattern. Since we’re at maximum hype, executives see flashy demos, demand results, and are looking for the ever-elusive ROI. Data scientists, AI engineers, and IT teams scramble to build something but six months later, the pilot sits unused while the next shiny object captures boardroom attention.
About Catalina Herrera
Catalina Herrera is the Field Chief Data Officer at Dataiku and a Colombian-born electronic engineer with over 25 years in the United States. She holds multiple master’s degrees in computer science, engineering technology, and data science. With 20+ years in advanced analytics, she has worked across different roles and technologies, from hands-on data science projects to enterprise consulting. Catalina helps organizations deploy machine learning and AI use cases that maximize data opportunities. In her spare time, she’s a DJ who uses AI to create her own music, embodying her philosophy of humans empowered by AI.
“A lot of people don’t really understand what agents are and what they bring to the table,” Catalina explains. “That’s going on a lot in the field.”
So what exactly are AI agents? Catalina describes them as systems that combine multiple types of intelligence to act autonomously. Think of an agent as software that can access your descriptive analytics (dashboards and reports), your predictive models (forecasting and risk algorithms), and generative AI capabilities, then orchestrate all these assets to complete tasks without constant human direction.
“The agentic layer happens when you combine all of these techniques that you are applying in terms of the question that you are asking of the data,” she explains. But it’s not just about technology. Success requires what she calls a “multi-variable model” that coordinates “all of those data sets and data outcomes, plus the people, plus the experts, plus the SMEs and everything else that needs to be part of that multi-variable model for this to be successful at the enterprise.”
Many think of AI agents as glorified chatbots, but they can be much more sophisticated than that. Agentic systems can leverage decades of organizational intelligence, human expertise, and institutional processes to “think” and then take autonomous actions without human intervention. But before this can become a reality, you need AI-ready data to provide the appropriate context.[1]
The technology is evolving at breakneck speed, but your workforce is likely already using agentic capabilities in ChatGPT, Gemini, or Claud. In the race to agentify everything, many organizations approach agents backward, chasing novelty instead of augmenting existing workflows.
“The reality of it is that I personally don’t think that a lot of people really understand what it is and what it brings to the table... And that’s going on a lot in the field. I will say that the higher the level of the way that you think about it, the better.”
— Catalina Herrera, Field Chief Data Officer at Dataiku
The companies that crack this code will create sustainable advantages. The ones that don’t will spend years playing catch-up, wondering why their expensive pilots fizzled out so quickly.
Stop building from scratch. Weaponize what you have.
Consider how this works in practice. In one demonstration scenario, a wind farm operator was drowning in maintenance data. Twenty years of sensor readings. Multiple predictive models for different turbine types. Maintenance crews with decades of expertise. All disconnected.
The agentic breakthrough wasn’t building something new. But rather, they simply connected existing assets through an intelligent layer that could understand and instantly act on institutional knowledge.
In this hypothetical deployment, a maintenance manager could type: “Email my crew the three turbines most likely to fail next week.” The system would pull sensor data, run predictive models, analyze failure probabilities, and send detailed maintenance orders. What previously required hours of manual analysis would happen in seconds.
“Ask the agent in a conversational user interface. Hey, send the email to my maintenance crew and attach to the email the top three turbines they need to focus on next week period... that is ROI right there, how many hours you saved in between, in terms of all the data gathering assets.”
— Catalina Herrera, Field Chief Data Officer at Dataiku
This pattern scales across various industries. For example, financial services may combine transaction monitoring with fraud detection models and regulatory knowledge bases to enhance their capabilities. Retailers can merge sales forecasting with inventory optimization and supplier data to improve their operations. Healthcare organizations often connect patient records with clinical trial protocols and drug interaction databases.
Your organization already has the intelligence. The question is whether you’ll connect it effectively or keep managing data silos manually while competitors automate.
Four questions that separate success from failure
Most failed pilots skip the fundamental evaluation steps. Teams often jump to implementation without ever defining success criteria or operational requirements. Catalina’s approach treats agents like enterprise systems that need a structured assessment.
What are you actually trying to accomplish? Which specific process will improve? Who will use this system? What metric will change? You need an executive sponsor who cares about the outcome, not just the technology.
How will you control the system? What prompts will users write? Which data sources will it access? What approval workflows must it follow? Are there regulatory constraints or policy requirements?
“First of all, you have to classify that use case into the four-part framework, which consists of delegation, description, assignment, and diligence. You need to know, first of all, the what and the why.”
— Catalina Herrera, Field Chief Data Officer at Dataiku
How will you judge performance? What constitutes a hallucination in your context? Which datasets will you use for testing? How will you compare different language models? What’s your acceptable cost and latency threshold?
Who will operate this long-term? Who monitors daily performance? Who approves system changes? How do you collect user feedback and incorporate improvements? What’s your process for handling model drift or data quality issues?
Start with internal use cases before customer-facing deployments. Internal pilots and proven AI deployment strategies let you refine the system and understand failure modes without external risk.[2]
The hidden risks of chained AI systems
We all know by now that simple ChatGPT queries hallucinate all of the time. What happens when you chain five AI agents together in a multi-step workflow? Does the error rate compound exponentially? I think of it like the butterfly effect – you know, you know, if it flaps its wings in some far-flung region of the world, which leads to a hurricane on the other side.
The concern is legitimate. Multi-agent systems introduce new failure modes that traditional software doesn’t have. Consider a hypothetical scenario where one agent analyzes suspicious transactions, passes recommendations to another agent for regulatory reporting, which triggers a third agent to file compliance documents. When the first agent misclassifies a legitimate transaction, the error cascades through the entire pipeline. Even worse, there can be small perturbations that compound and may go undetected at any single step.
“Keep the human in the loop, is not a joke... what it means to keep the human in the loop with an agentic flow that can be non-deterministic and can hallucinate, and that goes back to your original goal. What is it that you are trying to accomplish?”
— Catalina Herrera, Field Chief Data Officer at Dataiku
Catalina’s solution focuses on strategic human oversight rather than trying to eliminate uncertainty. “Keep the human in the loop is not a joke,” she emphasizes. “What it means to keep the human in the loop with an agentic flow that can be non-deterministic and can hallucinate goes back to your original goal.”
To help reduce uncertainty, system developers need to design appropriate checkpoints and oversight. Different language models produce different outputs for identical inputs, so systematic testing becomes essential. Token consumption can spiral out of control without rate limiting. End-to-end lineage tracking becomes mandatory when decisions are made across multiple systems.
The governance lesson from business intelligence applies directly. Organizations that didn’t manage BI deployments systematically ended up with thousands of conflicting reports and dashboards that nobody trusted. This still happens today – what could possibly go wrong with agents? It’s essential to design for controlled improvement of existing processes, not perfection.
Blockbuster had every advantage but missed the moment
Remember when you used to make it a “Blockbuster night”? They dominated video rental with 9,000 stores and deep customer relationships. They understood the movie business better than any startup. But, as history tells, they misread the inflection point where technology capability met consumer demand.
Netflix didn’t have better movies or superior customer service. They recognized that DVD-by-mail could replace store visits, and then that streaming could replace physical media entirely. Blockbuster saw the technology but missed the competitive timing.
A similar inflection point is happening today with AI agents. Although changing quickly, the technology is production-ready (at least for internal use cases). Your workforce expects intelligent tools. Early adopters are gaining measurable advantages while others debate whether to act.
“This is the opportunity for you to be the Netflix and not the Blockbuster. How are you going to ensure that you are going to maximize the opportunity that this brings for you and for your teams... this is the moment where if you do so, and if you do so right, is going to be a very clear differentiator in terms of your competitive landscape, so the time is now.”
— Catalina Herrera, Field Chief Data Officer at Dataiku
Manufacturing companies are increasingly automating quality control decisions that previously required human experts. Insurance firms are experimenting with processing claims in minutes instead of days. Logistics providers are testing real-time route optimization based on traffic, weather, and delivery constraints.
These represent early production deployments delivering measurable business value while many organizations remain in pilot phases.[3] “That is ROI right there, how many hours you saved in between, in terms of all the data gathering assets.” But, measuring real AI business value requires more than tracking time saved—it demands understanding actual business outcomes.
The window for competitive advantage remains open but appears to be narrowing. Organizations that establish systematic approaches to AI agent deployment may create sustainable advantages over those still debating implementation strategies.
The competitive landscape suggests that timing matters as much as execution. Whether organizations lead or follow in this transformation will likely depend on decisions made in the coming months rather than years.
Based on insights from Catalina Herrera, field CDO at Dataiku, featured on the Data Faces Podcast.
Podcast Highlights - Key Takeaways from the Conversation
[0:53] What AI agents actually are Catalina explains that agents orchestrate descriptive analytics, predictive models, and generative AI into a multi-variable model that includes “all of those data sets and data outcomes, plus the people, plus the experts, plus the SMEs and everything else that needs to be part of that multi-variable model for this to be successful at the enterprise.”
[6:05] Why most pilots fail “The reality of it is that I personally don’t think that a lot of people really understands what what it is and what it brings to the table. And that’s going on a lot in the field. I will say that higher the level on the way that you think about it, the better.”
[14:38] The four-pillar evaluation framework Catalina introduces delegation (what and why), description (how to instruct), assignment (how to judge), and diligence (how to operate). “You need to know, first of all, the what and the why. What is it that you are trying to accomplish? Who is going to be using these. What is the KPI that you are targeting to move?”
[15:24] The garbage in, garbage out reality “Now you have a layer there that is a very interesting layer, which is, now you are thinking about an AI system so you cannot come from your individuality, as in, I am building this one model and see how that one model is going to perform here. Now you have to think backwards.”
[19:53] Digital interns that leverage existing intelligence Catalina frames agents as “digital interns” that can access organizational knowledge and act on it. The predictive maintenance example shows how agents can combine 20 years of sensor data, multiple predictive models, and domain expertise into a simple request: “Send the email to my maintenance crew and attach to the email the top three turbines they need to focus on next week.”
[24:30] Human-in-the-loop is not optional “Keep the human in the loop, is not a joke. What it means to keep the human in the loop with an agentic flow that can be non deterministic and can hallucinate, and that goes back to your original goal. What is it that you are trying to accomplish?”
[32:24] Cost control and guardrails Catalina emphasizes the importance of rate limits, golden data sets, and monitoring token consumption. “You don’t want surprises. There are a lot of surprises so far in the field in terms of the bill from the tokens now on these llms, so it’s something serious to consider.”
[36:15] Avoiding agent sprawl When asked about the risk of repeating BI mistakes with too many agents, Catalina acknowledges: “Yes, it is the same risk, but I think we have a couple of lessons learned from the previous decade.” The solution is a repeatable framework: “Once you do it right again, you do it right once, and then copy paste.”
[39:56] The competitive differentiation moment “This is the opportunity for you to be the Netflix and not the blockbuster. How are you going to ensure that you are going to maximize the opportunity that this brings for you and for your teams... this is the moment where if you do so, and if you do so right, is going to be a very clear differentiator in terms of your competitive landscape, so the time is now.”
Based on insights from Catalina Herrera, field CDO at Dataiku, featured on the Data Faces Podcast.
[1] Sweenor, David. “Generative AI’s Force Multiplier: Your Data” TinyTechGuides, October 14, 2023. https://tinytechguides.com/blog/generative-ais-force-multiplier-your-data/
[2] Sweenor, David. “Generative AI Deployment Strategies: A Strategic Guide for CIOs and CTOs.” TinyTechGuides, May 21, 2024. https://tinytechguides.com/blog/generative-ai-deployment-strategies/.
[3] Sweenor, David. “How to Build a Compelling Business Case for Generative AI.” TinyTechGuides, September 22, 2024. https://tinytechguides.com/blog/how-to-build-a-compelling-business-case-for-generative-ai/