Unintended Second Order Effects

Around late 2025 Claude Code introduced a staple of agentic coding workflows into its harness. Plan Mode. Fast forward a few months and most developers will mention it as one of their favorite ways of solving problems with agents.

Like many other features, Plan Mode codified a workflow practitioners had already adopted into the harness. Armin Ronacher has a great write-up on how Claude's Plan Mode works here, and sure enough, once again, it's not that magic after all.

Insert markdown trenchcoat meme

I think what's relevant here are the tradeoffs we take when we opt into this workflow and the second-order effects of this kind of UX.

The TL;DR of this newsletter issue is about how a plan can be useful for documenting and preserving a direction once the direction is clear. It cannot decide whether the direction is right. Usually the missing step happens before the plan, in the conversation where the user and the agent figure out what is actually being asked. And since this workflow has gain so much popularity, some people just assume it magically works and solves all problems. It does not.

Capturing user intent. That's the hard part. If you have ever had to sit and listen to a non-technical person describing a solution to a system, you may have experienced this firsthand: the constant but why? popping into your mind while the person talks. If you don't understand the underlying intent, proposing a solution is pointless. Assumptions are the root of all evil, sometimes.

I've had many back-and-forths with this workflow and have been settling on something more sensible, at least in my opinion. Let's dive in.

The Socratic Method

Armin asks this in that article: (…) why can I not just ask the model to plan with me? Why do I have to switch the user interface into a different mode?

Fair point. In Just Talk To It, Peter Steinberger (Openclaw) says he rarely uses big plan files with Codex. Codex used to lack a dedicated plan mode, but he could write "let's discuss" or "give me options" and the model would wait until he approves the direction. In his words: "No harness charade needed. Just talk to it."

In Shipping at Inference-Speed, he explains the workflow more concretely. He starts a conversation, asks a question, lets the model search the web, lets it inspect the codebase, shapes the plan together, and only then writes "build". Plan mode, in his view, was a workaround for older models that struggled to follow conversational boundaries.

The idea of a creating a plan handout from a conversation with an agent has been floating around for sometime now. Its goal was mainly to manage context. But what mattered the most is the process you and the agent went through to end up on that document.

Neat Looking Mess

Agents are now good enough to make weak intent look structured.

That is a new failure mode. You give the model a half-formed idea. It returns a polished checklist with risks, implementation steps, file references, and test notes. The output feels solid because the format is solid.

The premise may still be wrong.

The agent planned around the wrong interpretation. That failure is harder to spot because it now looks competent. It did not crash, hallucinate wildly, or ignore the task. It did something more dangerous: it made your ambiguity look organized.

Planning too early freezes ambiguity instead of resolving it.

Everyone using agents has some version of this story: the agent produced something that seemed reasonable from a distance, only for you to find obvious implementation mistakes that were not even addressed during planning. And now you're wasting time asking it to refactor it.

Intent Extraction

We recently had a conversation with Jesse Vicent on our Fragmented Pod 🎙️(coming out soon), where described a brainstorming phase that happens before the plan.

That is a fundamental skill in Jesse's Superpowers framework. The Superpowers brainstorming skill does not let the agent sprint into implementation. It forces a design conversation first: clarify the problem, ask targeted questions, explore a few approaches with tradeoffs, and present the design in pieces for approval. Only after the design is accepted does the workflow move toward a plan.

The idea is that think good planning starts when the agent can restate the user's goal in a way the user recognizes. If the agent cannot do that, it's too early to plan.

A better workflow makes intent extraction explicit. The agent should help us notice missing constraints, hidden preferences, vague goals, and unresolved tradeoffs before it writes the implementation plan.

Make It Annoying On Purpose

Matt Pocock's "grill me" skill is another version of the same idea.

The skill forces the user to answer the questions they were avoiding. What does success look like? What constraints matter? What is explicitly out of scope? What tradeoff are we making?

That kind of questioning feels annoying in the same way a good design review feels annoying. It slows down the easy answer so the better answer has a chance to appear.

A Better Workflow

The workflow I have been settling on in Pi is plan-last.

Before the agent writes a plan or touches code, it has to brainstorm. It reads the relevant files, checks the current project state, and tries to understand the request inside the actual codebase. Then it asks questions.

pi-interview is basically a stronger version of the AskUserQuestion idea that showed up in Claude Code and all other harnesses since: instead of a plain question in the chat, it opens an interactive form where the agent can ask single-select, multi-select, text, image, and info questions. You can attach screenshots to replies, answer several related questions in one pass, and give the agent structured answers instead of another vague chat blob. That makes the brainstorming phase more like a design interview than a yes/no interruption.

Pi Interview extension makes this flow way smoother

Once the intent is clearer, the agent proposes a few approaches with tradeoffs.

After that, the agent writes a design document. The design captures the agreed direction: architecture, components, data flow, error handling, and tests.

Then I open that document in Plannotator. Reviewing inside a chat transcript sucks. In Plannotator, I can point at the exact paragraph that is wrong, annotate the missing constraint, or mark the assumption that needs to change. The agent takes those annotations and updates the design.

Only after that do I want a plan.

At that point the plan is not trying to discover intent. It is recording a direction we already fought over.

Plan Colab with Plannotator

Don't save on your own thinking tokens

If the goal is intent extraction, typing short prompts can become the bottleneck. You need to spend your thinking tokens freely and typing just increases friction. Stream of thought inputs are totally fine. Tools like Wispr Flow or VoiceInk or any decent transcription setup, let you ramble for a minute, include the reasoning behind the request, explain constraints, and give the agent more context than you would ever type manually. OpenAI's Codex best practices make the same point: speech dictation is a faster way to provide context. That extra context becomes the raw material the agent needs before it can ask better questions.

Voice Transcription Software is a Must

Think Before You Plan

You can outsource your thinking but you cannot outsource your understanding. The hard part is deciding what should be built, why it should exist, and which constraints should shape it. Plan mode helps after that work has happened.

Elsewhere in the Latent Space 🌌

Mario Zechner's Pi talk breaks down his thesis: if the model can work with a tmux session and raw keystrokes, the extra harness features need to justify themselves. The interesting bit is how much of good agent UX is restraint, not feature surface. If you're getting your feet wet with Pi, this is a good place to start.
How GPT, Claude, and Gemini are actually trained and served: Dwarkesh Patel sits down with Reiner Pope for a blackboard lecture where he walks through how frontier LLMs are trained and served. Really well put.
Thariq's Unreasonable Effectiveness of HTML pairs nicely with the visual-planning thread. Markdown is portable, but agents are increasingly producing artifacts humans mostly read, review, and annotate. HTML gives them more room to explain.
OpenAI Privacy filter: interesting use case of local models
BOX is hiring for internal agent engineering roles: Aaron Levie posts about how this new role is meant “to wire up internal systems and get agents working with them effectively”. Interesting takes.

Tools

Flue is a Typescript framework for building agents. It nicely creates abstractions for creating well defined agents. You write an isolated code review agent without having to be just manipulating markdown files. It finally gives you that fined grained control bridging the non-deterministic world of LLMs with deterministic coding. Looks really promising in my opinion.
open-slide is a slide framework built for agents: describe the deck, let the agent write React, and let the framework handle canvas, scaling, and export. It is another small sign that "agent-readable production surfaces" are becoming their own category.
Printing Press is agent tooling taken to its logical extreme: every API becomes a CLI, and compound commands become the interface. It is very much in the "the world should look like tools agents can traverse" camp.
pi-context-usage is tiny but useful: a Pi extension that visualizes context usage. Context is budget, memory, and failure surface all at once. Seeing it should be table stakes.
Pierre's Diffs is a reminder that better review surfaces are part of agent UX. If agents are going to generate more code, humans need faster ways to inspect what changed without drowning in plain text.
Hunk pushes the same review-first idea into the terminal: a diff viewer for agentic coders. The interesting direction is obvious: agent output needs review tools built around supervision, not just prettier patches.
ghui If you're sick of GH you can try this keyboard-driven PR review UI.

Why Plan Mode is making things worse