The agentic autonomy ladder
Teams talk about "AI agents" like it's one thing. It isn't. There's a ladder of autonomy and each rung has different trade-offs around speed, reliability, and how much human attention it costs to run.
Most teams overshoot. They aim for L4 because "autonomous" sounds impressive, then end up at L1 because L4 is brittle and L3 is unreviewed. The honest sweet spot in 2026 is L2 with tight guardrails: an agent that can read your repo, run tests, and propose a PR — but a human still ships it.
We've found that L3 is worth the effort only when (a) the task is repetitive enough that the planning overhead amortizes, and (b) the failure modes are observable. Customer-support triage, log-summarization, and code-migration scripts fit. Open-ended product work doesn't, yet.
The team-design implication: hire engineers who can compose L2 agents fluently, not engineers who promise L4. The former ship every week. The latter ship a demo.
