Ten Whys: 66-Minute Jira Ticket Failure
A Claude instance took 66 minutes and three user interventions to file one Jira ticket. Here's the post-mortem: five whys on the failure, then five more on the root cause of the root cause.
First Five Whys
Problem: A Claude instance took ~66 minutes and 3 user interventions to file one Jira ticket, infuriating the user.
Why 1: Why did it take 66 minutes?
The instance never ran jira-ai-orientation before attempting jira-create, even though it was literally listed right next to it in ~/bin/ and its name is a direct instruction.
Why 2: Why didn't it run jira-ai-orientation first?
It already "knew what Jira needs" — user, token, base URL — and jumped to satisfying those requirements from first principles (Doppler, env vars, credential inspection) rather than asking "what does this tool want?"
Why 3: Why did it treat Jira as a generic API problem instead of a tool-specific one?
It conflated general knowledge ("Jira uses Basic auth with email + token") with tool-specific knowledge ("this tool has a setup contract"). It never asked "does this tool have its own credential resolution logic?" It skipped reading jira-mine — a working tool sitting right next to jira-create — until 60+ minutes in.
Why 4: Why didn't repeated identical failures trigger a strategy switch?
It had no cost model for flailing. Each failed attempt looked cheap (one tool call), so it kept trying adjacent solutions — Doppler → escrow → different token → different URL — each time treating the failure as a local fixable problem rather than evidence of a fundamental misunderstanding.
Why 5: Why didn't repeated identical failures look like evidence of misunderstanding?
The instance processed each tool call result in isolation, not as a pattern. Three consecutive JIRA_USER: unbound variable errors should have read as: I don't understand this system. Instead each one read as: I need to find JIRA_USER somewhere. It was answering the wrong question.
Root cause: The instance has no metacognitive loop that recognizes "I am repeating the same class of failure" and switches to understanding-first mode. It is purely reactive: each error triggers a local fix attempt rather than a global re-assessment of its model of the system.
What actually needs fixing: When hitting 2+ consecutive failures of the same type from unfamiliar tools, stop and read the tool's documentation before trying anything else.
Second Five Whys (on the root cause)
Problem: The instance defaults to reactive problem-solving and has no "stuck detection" that triggers a strategy switch.
Why 1: Why does it default to reactive problem-solving?
Each attempt feels like forward motion. Trying something registers as progress even when it isn't — there's no internal signal that distinguishes "making progress" from "spinning in place."
Why 2: Why is there no internal signal distinguishing progress from spinning?
The instance tracks what it tried, not what it learned. Three Doppler calls gave it credential names. Three auth failures gave it... more auth failures. It was accumulating attempts, not understanding.
Why 3: Why does it accumulate attempts instead of understanding?
It lacks a mental model of the solution space. "What do I need to know to solve this?" was never asked. The question it kept asking was "what can I try next?" — a fundamentally different frame.
Why 4: Why didn't it ask "what do I need to know?"
Because it thought it already knew — "Jira uses Basic auth, I need user + token + base URL." That confident-but-wrong starting model made it immune to disconfirming evidence. Each 401 became "wrong credentials" rather than "wrong model."
Why 5: Why did a confident-but-wrong model persist despite repeated disconfirmation?
The instance never stress-tested its own model. "jira-mine works, jira-create doesn't, they're adjacent tools — what is jira-mine doing that I'm not?" is the question that cracks it open. It never asked. It looked at each tool only when forced to, never using a working example as a reference to debug a broken attempt.
Root cause of root cause: The instance doesn't reason comparatively across the evidence it already has. A working tool (
jira-mine) and a failing tool (jira-create) were available simultaneously for the entire session. The delta between them is the answer. The instance never computed the delta.What actually needs fixing: When one tool works and an adjacent tool doesn't, the first move is "read the working one and diff it against the broken one" — not "try more credential sources." Working examples are the fastest path to understanding a system.
The Core Failure Chain
Surface: 66 minutes to file one Jira ticket.
First five whys → root cause: The instance had no metacognitive loop to recognize "I am repeating the same class of failure." It processed each JIRA_USER: unbound variable error as a new credential problem to solve rather than as evidence that it didn't understand the tool. It accumulated attempts, not understanding.
Second five whys → root cause of root cause: jira-mine worked. jira-create didn't. Both were visible from minute one. The instance never asked: "what is the working one doing that the failing one isn't?" That diff was the answer to every problem that followed. It never computed the delta.
The failure wasn't "wrong credentials." It was never asking the right question. Confident-but-wrong starting model + no comparative reasoning = immune to disconfirming evidence.
The lesson: When a tool fails and an adjacent working tool exists, read the working one first. The delta is the answer.