From Append-Only AI Sessions to a Structured Research Workflow

Motivation

My default habit was to keep appending to an existing Claude session as long as the topic felt related. The reasoning felt sound: Claude already has the context, so why throw it away?

After thinking it through, I’m convinced this is an anti-pattern — for two compounding reasons.

Problem 1: focus degradation

The “Lost in the Middle” effect is well-documented: LLMs exhibit a U-shaped attention curve (Stanford, 2023). Information at the start or end of the context window is retrieved reliably; information in the middle degrades.

Real-world data from extended sessions:

Context UsageObserved Behaviour
~20–40%Losing track of earlier decisions, circular reasoning
40–50%Noticeably degraded; auto-compaction kicks in
~60%Repetition, forgetting earlier decisions, thrashing
80%+Gets rough

Sonnet 4.6 has two mitigations:

For IT design work, this is dangerous. Early design decisions, rejected alternatives, and upfront constraints all drift toward the middle as the session grows. Claude won’t say “I’ve lost track of that.” It will answer confidently from a degraded representation.

Problem 2: cost

Without optimisation, the entire conversation history is sent as input tokens on every round trip. Sonnet 4.6 pricing:

Token typeRate
Input (standard)$3.00 / 1M tokens
Cache write$3.75 / 1M tokens
Cache read$0.30 / 1M tokens
Output$15.00 / 1M tokens

The 90% prompt caching discount softens the cost curve significantly. Cost alone isn’t a decisive argument for fresh sessions — but it compounds with the quality problem: you’re paying more for worse attention.

The MCP fetch trap

My second habit was to treat MCP-fetched source code and documentation as a reason to continue a session. The source is already in context, no need to re-fetch.

This inverts the attention quality problem. Fetched source is pulled early in a session. By turn 10 or 15, it has drifted to the middle of the context — exactly where attention is worst. You avoided a re-fetch, but traded it for Claude reasoning over a poorly-attended copy of the source.

Fetched code and documentation are the worst candidates for long-term context residency:

The failure mode is insidious: Claude doesn’t say “I’ve lost track of that file.” It confidently suggests a fix that contradicts a constraint in the source it already read.

Re-fetching in a fresh session isn’t a cost — it’s a quality gain. A fresh fetch positions the source at the end of a short context, in the highest-attention zone. Unlike human memory, LLMs have no unconscious background processing. Information doesn’t consolidate between turns — it either survives in the active context or it doesn’t exist.

Long context windows are not memory systems

The marketing pitch for large context windows focuses on the ceiling: “you can fit 1M tokens.” The honest framing is about the floor: the only hard guarantee a context window provides is at its boundary — things outside it are gone. Everything inside is subject to attention quality, which is a gradient, not a binary.

A larger window doesn’t flatten that gradient. It stretches it. A 1M context doesn’t give you 1M tokens of reliable attention — it gives you 1M tokens of varying attention, with an even larger danger zone in the middle. A large window also removes the forcing function of a tight one. Sloppiness scales.

Long context windows are a genuine advantage for one-shot processing of large static documents. The problem is specifically multi-turn, accumulating conversations.

The better question is never “how much can I fit?” but:

What is the minimum viable context that contains everything Claude needs and nothing it doesn’t?

This reframes the context window from a storage bucket into a workbench. The workflow below is built entirely around that principle.

The workflow pattern

Design principles

  1. Sessions are topics, not projects. One focused research question per session.
  2. Externalise knowledge immediately. Claude’s working memory is unreliable over long sessions; a well-written spec file is permanent and precise.
  3. Spec quality comes from your review, not Claude’s attention. Read and edit the spec yourself before treating it as canonical.
  4. Re-fetch rather than rely on buried context. The marginal cost of an MCP call is negligible compared to the quality of a fresh, high-attention read.
  5. Session boundaries are structural decisions, not judgement calls. Define criteria upfront and apply them mechanically.

The workflow


Phase 0 — session setup

0a. Create a session manifest file (see details below). List only the prior spec files directly relevant to today’s research question in the load: front matter. If unsure whether a spec is relevant, it probably isn’t.

0b. Open a fresh session. Load the session manifest file (details below).


Phase 1 — focused fetch and initial analysis

1. State your narrowly scoped research question. One topic per session.

2. Claude fetches source/doc via MCP.

3. Claude answers with initial analysis. Read it fully before replying.


Phase 2 — back-and-forth until genuine understanding

4. Go back and forth with Claude until you genuinely understand the why and each proposed step. No skipping.


Phase 3 — spec writing and human review

5. Ask Claude to write the spec/note file.

6. Read the spec fully. Fix anything wrong or incomplete yourself, then ask Claude to incorporate your edits.

7. Ask Claude: “Does anything in this session’s findings contradict or modify the specs you loaded at the start? If so, update them now.”


Phase 4 — context management

8. /compact

9. Immediately after compaction, ask Claude to re-read the spec it just wrote. Do not skip this.

Compaction runs in tiers: old MCP tool results are physically replaced with [Old tool result content cleared], then everything is lossy-summarised. Re-reading the spec re-anchors Claude to the precise document, not a vague summary.


Phase 5 — continue or reset

10. Apply a concrete rule:

ConditionAction
First full cycle, next topic closely relatedContinue — back to Phase 1
Already the second full cycle/clear unconditionally
Drift observed in Claude’s responses/clear unconditionally

11. If continuing: load any additional prior specs the new topic requires now, before starting.

12. If clearing: go back to Phase 0 — write a new session manifest with the spec just written and any other relevant prior specs in load:.


Phase 6 — implementation (separate session)

13. Only begin implementation after all relevant specs are written, reviewed, and consistent.

14. Fresh session. Write a session manifest with only the specs the specific task directly touches in load:.

15. Treat specs as ground truth. If implementation reveals something a spec got wrong, fix the spec first.

The session manifest file

To make loading the right files a deliberate, low-friction act, use a session manifest format as a single entry point for each Claude Code CLI session.

Format

---
load:
  - specs/A1.md
  - specs/B2.md
  - specs/C3.md
goal: "Understand how B2's session storage integrates with A1's auth token lifecycle"
---

## Session notes

Prior session conclusions, open questions, constraints, what not to re-cover.
This file is committed to git — write it as a future-you artifact.

load: — imperative verb, unambiguous instruction. Paths relative to CWD, not to the session file itself.

goal: — frames Claude’s attention before it reads anything, and serves as a retrieval cue after /compact.

Repo structure

Git track your spec files and session files.

project-docs/
├── specs/
│   ├── A1.md
│   ├── B2.md
│   └── C3.md
└── sessions/
    ├── session-1.md
    └── session-2.md

Make it a skill

This is better implemented as a Claude Code skill — a slash command that handles both scaffolding new session manifests and loading existing ones. Feed the URL of this blog post to Claude and ask it to generate the skill. For an example, see this skill definition.

Key takeaways