15.01 2025 · 8 min read

Part 4: Agents Have a UX Problem

Intelligence isn't the bottleneck. Visibility is.

Early Release: This is a draft version of the article and will continue to evolve.

Co-authored with AI: This article was written in collaboration with AI tools.

Everyone is focused on making agents smarter. But there's a problem that gets far less attention: making agents manageable. Without the right interfaces, the human learning mechanism we described in article 2 breaks down. The system flies blind.

Intelligence will keep improving. Models will get faster, cheaper, more capable. That trajectory is well-funded and well-understood. What's less understood is that none of it matters much if the people working with agents can't see what's happening inside them.

Flying Blind

Most agent deployments today give you a chat interface and a place to upload documents. That's the entire management surface. You can talk to the agent and you can feed it information. What you can't do is see what it knows, understand why it made a particular decision, or systematically learn from its failures.

I saw this firsthand at a previous company. We had a customer-facing agent that the support team was responsible for. The team could feed information to the agent through chat-style interactions, and they could upload documents to the knowledge base. But they had no visibility into the RAG system that powered the agent's responses. They couldn't see which documents the agent retrieved for a given question, couldn't understand why it chose one piece of information over another, and had no way to create test cases that would catch recurring errors.

The result was predictable. When the agent gave a wrong answer, the team's only option was to try rephrasing the source material and hope it worked better next time. No diagnosis, no systematic improvement, just trial and error through a keyhole. The agent wasn't bad. The interface made it unmanageable.

This is more common than most people realize. A team buys or builds an agent, gets it working well enough to deploy, and then the people responsible for keeping it accurate day-to-day have almost no tools to do so. It's like giving someone a car with no dashboard, no mirrors, and no speedometer, then wondering why they keep crashing.

Interfaces as Learning Infrastructure

Article 2 argued that humans are how agent systems learn. Domain experts catch errors, identify edge cases, and encode corrections that make the system better over time. But that argument assumed something we didn't examine closely enough: that the interface actually supports this kind of learning.

Ask a few questions about any agent deployment and you'll quickly see whether the learning loop is intact or broken. Can the person using the agent see its reasoning? Can they trace which sources it used to construct an answer? When it gives a bad response, can they flag it and turn it into a test case with minimal friction? Can they update the agent's knowledge without filing a ticket with the engineering team?

If the answer to most of these is no, the flywheel from article 2 isn't spinning. Not because the AI isn't capable enough, but because the interface sits between the human and the agent like a wall. The domain expert has the knowledge to improve the system. The system has the capacity to improve. But the interface doesn't connect the two.

This is why UX is a more important problem than it appears at first glance. It's not about aesthetics or ease of use in the conventional sense. It's about whether the organizational learning mechanism actually functions. A beautifully designed chat interface that hides the agent's reasoning is worse, for this purpose, than an ugly dashboard that exposes it.

Ease Against Controllability

Many agent platforms today optimize for ease of setup. Describe your company in a few sentences, upload your documents, and you have a working agent in minutes. For demos and proof-of-concepts, this works well. For anything that matters, it creates a trap.

The problem surfaces when the agent starts failing in ways that are hard to diagnose. A customer gets a confidently wrong answer. Was it a bad entry in the knowledge base? A prompt conflict between two instructions? Something the agent inferred from partial information? With a chat-based configuration, you often can't tell. The system that was easy to set up becomes impossible to debug.

Making agents easy to use and making them controllable are goals that pull in opposite directions. Ease wants to hide complexity. Controllability wants to expose it. For casual use cases, ease wins and should win. But for knowledge-critical deployments, the kind we've been discussing throughout this series, controllability is what separates systems that improve from systems that stagnate.

This doesn't mean the interface needs to be complex. It means it needs to be layered. A domain expert should be able to see what the agent knows at a glance, then drill into specific knowledge entries, then see how those entries influenced a particular response, then create a test case from that response, all without needing engineering support at any step. Each layer adds depth without adding friction to the layers above it.

What Good Looks Like (So Far)

No one has fully solved this yet. But the principles are becoming clearer through practice.

Visible reasoning. When an agent makes a decision, the people responsible for it should be able to see the chain of reasoning. Not just the final answer, but which knowledge it drew on, what it prioritized, and where it was uncertain. This isn't a nice-to-have. It's what makes diagnosis possible.

Source attribution. Every claim the agent makes should trace back to a specific source. When the answer is wrong, you need to know whether the problem is in the source material, in how the agent retrieved it, or in how it interpreted it. These are three different problems with three different fixes.

Domain expert tooling. The people who know the domain, the support leads, the compliance specialists, the operations managers, need to be able to refine prompts, manage knowledge entries, and build test cases directly. If every improvement requires an engineer, the feedback loop slows to the pace of sprint planning, and most corrections never happen at all.

Versioning. When agent behavior changes, you need to understand what changed and why. Did someone update a knowledge base entry? Did a prompt get modified? Versioning isn't just about rollbacks. It's about building an organizational understanding of cause and effect in the system.

Diagnostic monitoring. Standard monitoring tells you that something went wrong. Agent monitoring needs to tell you where in the system it went wrong. Was it retrieval? Reasoning? A gap in the knowledge base? An outdated source? The difference between "the agent gave a bad answer" and "the agent gave a bad answer because it retrieved an outdated policy document" is the difference between frustration and a fixable problem.

Test cases as organizational memory. Every corrected error should become a test case. Over time, these test cases form a library that encodes what the organization has learned about how the agent should behave. This is the executable version of institutional knowledge, and it's one of the most valuable assets an agent deployment produces.

A Design Problem

It's tempting to frame all of this as secondary. The real challenge is intelligence, the argument goes, and the interface is just packaging. There's truth in that. Model capabilities are still the primary constraint for many use cases, and improvements in reasoning, context handling, and tool use will unlock things that no amount of good UX can compensate for.

But for organizations deploying agents today, the interface problem is more immediate and more underestimated than the intelligence problem. The models are already capable enough to be useful in many domains. What's missing is the surface that lets humans and agents work together effectively: the tools for seeing, understanding, correcting, and improving.

The companies building these interfaces, the ones that let non-technical domain experts develop and maintain agents with real depth and control, are solving a problem that will matter for as long as humans need to stay in the loop. And based on everything we've discussed in this series, that will be a long time.

We've been treating AI as if the hard problem is purely intelligence. It's not the only hard problem. The interface, the surface where human understanding meets machine capability, determines whether the system actually learns. Get it wrong, and it doesn't matter how smart your agent is, because no one can see what it's doing or steer it when it drifts. Get it right, and even a modest agent becomes part of a system that improves over time. The bottleneck isn't always the AI. Often, it's the UX.