AI StrategyApr 11, 20267 min read

The agent washing checklist: 12 questions to ask before you buy AI software.

Half the "AI agents" on the market are language models wrapped in a Zapier flow with a confident name. Here are the 12 questions that tell you which half you’re looking at.

Peony GerochiFounder

"AI agent" has become the vendor word of 2026. It’s on every landing page, every demo, every LinkedIn ad. Half of them are genuinely agents. The other half are language models wrapped in a Zapier flow with a confident name on top.

The technical term for the second group is "agent washing." You won’t see it on the marketing site. You’ll see it six weeks into the pilot — when the "agent" can’t handle the fifth step of your workflow and your ops lead is copy-pasting data into a fallback spreadsheet.

Here are the 12 questions we ask every AI vendor before a client writes a check. Most vendors answer the easy ones in the sales call and dodge the hard ones until implementation. The dodging is the point. If a vendor can’t answer them clearly, you have your answer.

Section 1: What does it actually do?

Does it reason, or does it follow a script? An agent reasons about what to do next. A chatbot picks from a menu. Ask the vendor to run an unscripted scenario — a customer email they’ve never seen, a workflow branch nobody mapped. If the demo only works on the happy path, it’s not an agent.

Walk me through the full decision loop. Who calls the model? What tools does it have access to? Where does the output go? Who reviews it? If the answer is vague, there’s probably a human in the loop they’re not telling you about.

What happens at step 4? Most demos work for two or three steps. Step 4 is where the real test lives. Ask: "Once the agent has done steps 1 through 3, what’s the branching logic at step 4? Show me." Agent washing usually falls apart here.

Is this an agent or an LLM with a nice UI? An agent takes actions in the world. It books the meeting, it updates the CRM, it sends the follow-up. An "AI assistant" that drafts text for a human to send is a writing tool, not an agent. Both are fine. They’re not the same thing, and they shouldn’t cost the same.

Section 2: How does it fail?

What does the error rate look like on tasks you haven’t seen? Every vendor shows you the 94% success rate. Ask about the 6%. What did those look like? How were they caught? Who paid for the mistake? If the vendor hasn’t thought about failure modes, they haven’t shipped this enough times.

What’s the hallucination rate on our data? RAG systems hallucinate more when the data is messy. Yours is messy. Ask the vendor to run a test on a sample of your real content and show you the raw output — not a curated demo. You’ll learn more in 20 minutes than in four sales calls.

What happens when it gets stuck? Does it loop forever? Does it escalate to a human? Does it silently fail and move on? We’ve seen all three. The right answer is "it fails loudly and routes to a human." Anything else is a time bomb.

Who owns the mistake? If the agent sends a customer the wrong quote, who pays? If it books the wrong meeting, who explains it? Vendors love to talk about accuracy until something goes wrong. Get the accountability answer in writing.

Section 3: What happens after you sign?

Who maintains this in 6 months? AI systems drift. Models get updated, data changes, edge cases accumulate. Ask who’s responsible for keeping it working and what you’re paying them for that work. If the answer is "you are," factor that into the price.

What do I own when you leave? If the vendor disappears tomorrow, do you still have a working system? Can your team run it? Who has the prompts, the evaluation data, the orchestration code? If the answer is "we host everything," you don’t own this. You rent it.

Show me a client who churned, and tell me why. Every vendor has churn. The ones who admit it are usually the ones worth trusting. The ones who claim 100% retention are either lying or haven’t sold enough deals to matter.

What would make you walk away from this project? Our favorite question. A vendor who can describe a situation where they’d refund you is a vendor who’s thought about how this actually ends. A vendor who says "that won’t happen" hasn’t done this enough times to know it does.

The counterpoint

Some of these questions will annoy the sales team. Good. Annoyance is a useful filter. The vendors worth buying from answer them without getting defensive. The ones who get defensive are telling you something.

There’s also a version of this that’s too suspicious. Not every AI vendor is running a scam. Plenty are building real tools and shipping real outcomes. The goal isn’t to catch liars. The goal is to make sure you and the vendor mean the same thing when you say "agent."

The short version

Agent washing is the gap between "we call it an agent" and "it completes the task without you." The 12 questions close that gap. If a vendor can’t answer most of them in a 45-minute conversation, you have your answer — and you just saved yourself a failed pilot.

Print this out. Take it to your next demo. Watch what happens.

agent washing AI buying vendor evaluation AI agents

Most AI proofs of concept die. Here’s the one thing that saves them.

Apr 6, 2026 · 6 min read

All posts