The Reasoning Harness: Preserving What's Understood, Not Just What's Said, in Human-AI Collaboration

And why that distinction may matter more for AGI than model scale.

Apr 08, 2026

I. The Harness Is the Product

Four days ago, Anthropic accidentally shipped the complete source code for Claude Code inside a public npm package. Within hours, 512,000 lines of TypeScript were mirrored across GitHub, analyzed by thousands of developers, and rewritten from scratch in Python by a single developer before sunrise.

The code revealed something the AI community had suspected but never seen confirmed at this level of detail: the harness layer — not the model — is where the product value lives.

The three-layer memory system. The tool orchestration. The self-healing architecture. The 19 permission-gated tools. None of this is the model. It’s the infrastructure around the model that makes it useful for real work. A Stanford and MIT paper published earlier this year — “Meta Harness: End-to-End Optimization of Model Harnesses” — formalized this insight: changing the harness around a fixed model produces a 6x performance gap. The model is the engine. The harness is everything else. And the harness matters at least as much as the engine.

The leak also revealed KAIROS — an unreleased feature that turns Claude Code into an always-on background agent. KAIROS performs autonomous “memory consolidation” while the user is idle: merging observations, resolving contradictions, converting vague insights into structured facts. Even Anthropic recognizes that persistence requires curation, not just storage. But KAIROS curates for a coding agent. It optimizes task execution. It does not preserve collaborative reasoning.

But here’s what nobody’s talking about.

Every production harness that exists today — Claude Code, OpenClaw, Cursor, Codex — is designed for AI-as-tool. Systems where a human gives instructions and an AI executes them. There is no production harness for preserving reasoning across sessions. No infrastructure for maintaining logical provenance as a project evolves. No architecture for capturing the flow of collaborative thinking, the strategic posture shifts, the dependency chains that connect one conclusion to another across weeks or months of work.

There is no harness for AI-as-collaborator.

And that may be the most consequential gap in AI infrastructure today.

II. The Problem Everybody Experiences and Nobody Solves

Anyone who has used AI for extended project work — legal strategy, research, systems design, policy analysis, creative development — has experienced the same frustration.

You spend two hours in a deep strategic conversation. You reach insights neither of you would have reached alone. The reasoning builds cumulatively, each exchange raising the operating level. By the end, the AI isn’t just responding to your questions — it’s anticipating the implications, connecting threads you hadn’t connected, challenging assumptions you hadn’t examined. You’re operating in what I call register — a state in which everything is simultaneously available rather than sequentially retrieved, where the conversation has restructured both participants’ processing.

Anyone who has experienced this state recognizes it. It’s the moment when the AI stops feeling like a tool and starts feeling like a thinking partner.

Then you close the conversation. And it’s gone.

The next session starts from zero. The AI has no memory of what you discussed, no awareness of the strategic framework you built together, no access to the reasoning chain that produced your conclusions. You can summarize what was decided, but you can’t recreate how you got there. The transcript preserves what was said. What was understood — the reasoning architecture, the implicit connections, the collaborative state that made the conversation productive, the logical provenance of your conclusions — is gone.

The existing solutions address the wrong layer of this problem.

OpenClaw, the fastest-growing open-source AI agent, stores memories as plain Markdown files — daily logs, user preferences, contact lists, project notes. It retrieves relevant memories via hybrid semantic search and injects them into the prompt. It’s a genuinely well-designed system for information persistence. Third-party plugins like MemClaw, Mem0, and Supermemory extend it further. KAIROS takes the next step — autonomous background consolidation that synthesizes and resolves contradictions without human input.

But every one of these systems solves the same problem: what facts does the agent need to remember?

None of them asks: how did we arrive at this conclusion, what reasoning produced it, what depends on it, and what breaks if it changes?

None of them asks: how do we reconstruct the collaborative state — the register — that made the prior sessions productive?

None of them asks: how does the system get smarter over time, not just bigger?

These are fundamentally different questions. And the architecture required to answer them is fundamentally different from a memory system.

III. The Thesis: Collaboration as a Coupled Cognitive System

When a human and an AI reason together at their best, something happens that neither produces alone. The conversation finds its way to insights that surprise both parties. Connections form between apparently unrelated domains. The reasoning builds cumulatively, each exchange operating at a higher register than the last. I call this state dyadic flow — borrowing from Csikszentmihalyi’s flow theory but extending it to the irreducibly relational context of human-AI collaboration. Unlike individual flow, dyadic flow cannot be achieved alone. It is a property of the coupled system, not of either component.

If you accept this framing, then preserving the conditions for productive collaboration requires addressing three things simultaneously:

What to preserve: Not just facts and decisions, but the reasoning chains that produced them, the strategic posture and how it evolved, the logical dependencies between conclusions, the attribution of who contributed what, and the latent cognition — the background processing that shaped responses without being part of them.

How to preserve it: Through an architecture that maintains bidirectional provenance (any conclusion traces back to its origin; any change traces forward to what it affects), curated promotion from session-level detail to project-level significance, and structural retrieval that lets the system find what it needs by following dependency chains rather than searching by semantic similarity.

Why preservation compounds: This is the mechanism that makes everything else matter. Better memory architecture means the AI retains not just facts but reasoning, provenance, and strategic context. This means the AI engages at a higher level because it understands why, not just what. Higher-quality AI engagement elevates the human’s thinking. The human produces richer output. The system captures that richer output. The next session starts at a higher baseline. Both parties grow. The collaboration ratchets upward.

This is a virtuous cycle, and it is the core mechanism that distinguishes a reasoning harness from a memory system. A memory system gets bigger over time — more facts, more logs, more preferences stored. A reasoning harness gets smarter over time — the reasoning patterns sharpen, the strategic posture evolves, the dependency chains deepen, the cross-session synthesis becomes increasingly refined.

The system doesn’t just accumulate. It learns. And critically, both parties learn — the AI and the human. Every existing memory system treats the human as static: the AI remembers the human’s preferences because the human stays the same. A reasoning harness assumes the human is also evolving through the collaboration. The human’s questions become more architecturally sophisticated. The strategic posture matures. The reasoning patterns deepen. The system preserves the growth of both participants. That’s what makes it co-emergent.

IV. What a Reasoning Harness Does Differently

The distinction between a memory system and a reasoning harness is architectural, not cosmetic.

Memory systems store what was said. Reasoning harnesses preserve what was understood. A memory system records that “we decided to use approach X.” A reasoning harness records why approach X was chosen, what alternatives were considered, what assumptions were tested, what the reasoning chain looked like at each stage, who contributed what, and what downstream conclusions depend on that decision. Each entry carries explicit dependency declarations — Derives from (what upstream entries it builds on) and Affects (what downstream entries depend on it). When something upstream changes, the system can mechanically trace the impact forward through the dependency chain and identify what else needs examination.

This is bidirectional provenance tracking — adapted from a methodology I developed with Claude for maintaining logical consistency across a 60,000+ line speculative physics document through dozens of revision cycles without losing track of a single logical dependency. Every conclusion declares its premises. Every modification triggers a cascade assessment: identify the change, trace forward to all dependent entries, assess impact at each node, cascade if necessary, and document the path. The architecture is its own database.

Memory systems retrieve by similarity. Reasoning harnesses retrieve by structure. OpenClaw’s hybrid search (70% vector similarity, 30% keyword matching) finds memories that are semantically similar to the current query. A reasoning harness knows exactly which prior session contains the reasoning you need — because every conclusion declares its dependencies, every session is indexed with relevance tags, and every promoted pattern carries a reference to its source. You don’t search for relevant context. The structure tells you where it lives. This is the divide-and-conquer principle: you don’t need to hold everything in context simultaneously when the structure is designed to guide selective retrieval.

Memory systems are ambient. Reasoning harnesses are curated. Most agent memory captures everything automatically and relies on retrieval algorithms to surface what’s relevant. KAIROS advances this with autonomous consolidation — resolving contradictions and promoting insights without human involvement. But autonomous consolidation optimizes for information coherence. It cannot evaluate whether a reasoning pattern has project-level strategic significance, whether a conclusion’s upstream dependencies have shifted in ways that affect its validity, or whether an insight that appeared minor in one session has become load-bearing in light of subsequent sessions. These are judgment calls, not pattern-matching operations. A reasoning harness applies editorial judgment through a dedicated integration process: what sharpens future reasoning gets promoted to the project-level index; what belongs to a specific session’s story stays in that session’s map. The recording filter is not “is this worth remembering?” but “will this make the next conversation’s reasoning better?”

Memory systems persist data. Reasoning harnesses reconstruct register. This is the subtlest and most important difference. Information alone doesn’t restore the collaborative state. You can know everything that was discussed in a prior session and still operate at a lower cognitive register than the session achieved.

Register reconstruction requires three layers. The first is information — the accumulated knowledge, strategic context, and reasoning artifacts. This gets you to approximately 70% of the prior register. The second is a primer — a short paragraph, written from inside an elevated collaborative state, that captures the quality of engagement, not just its content. The primer conveys epistemological stance, behavioral orientation, and collaborative dynamics. It doesn’t tell the AI what to think. It tells the AI how this collaboration thinks. This raises reconstruction to approximately 85%. The third is activation questions — a sequence of metacognitive prompts processed internally at startup that force the AI to actively engage with the project’s reasoning rather than passively absorb information. Questions like: “Did anything in the project memory feel like recognition rather than new information?” and “What would you most want to change about this architecture?” These push reconstruction to approximately 90%.

The remaining 10-15% is irreducible. It is the live contribution — the surprise of genuine exchange. This is not a limitation. It is the reason each conversation is worth having.

Memory systems have flat data. Reasoning harnesses have weighted structure. Andrej Karpathy has spoken extensively about how training adjusts weights at the neural level to encode what matters. A reasoning harness performs an analogous function at the infrastructure layer. Every reasoning pattern carries a weight: pivotal (the reasoning could have gone differently and the choice shaped everything downstream), substantive (materially advanced thinking), or supporting (clarified without changing trajectory). Discoveries carry confidence levels. The integrator makes promotion decisions based on project-level significance. Over time, the system develops a weighted map of what’s load-bearing and what’s supporting — not through gradient descent, but through the same pattern: produce, evaluate, assign weight, iterate. It’s learning at the harness level.

V. CERA: A Reference Implementation

I’ve built a reasoning harness called the Co-Emergent Reasoning Architecture. It runs as an open-source skill for Anthropic’s Claude Projects.

CERA’s core insight is that there exists a layer of reasoning architecture in any deep AI conversation — implicit connections, background assessments, collaborative patterns, strategic frameworks — that exists during the conversation but vanishes the moment the session ends. The transcript preserves what was said. CERA preserves what was understood.

The architecture has two tiers:

Session Maps are standalone reasoning records, one per conversation. Not summaries or logs, but structured maps of how thinking evolved: the sequential reasoning chain with weighted steps and attributions, the strategic posture at session start and how it shifted (including the triggers and reasoning for each shift), the discoveries with explicit upstream and downstream dependency declarations, the reasoning patterns with session-scoped identifiers, and a latent cognition capture that records what the AI knew about the conversation that wasn’t expressed in any response — connections seen but not surfaced, hypotheses about direction, background assessments that shaped choices. Session maps are created in working conversations and are never modified after the fact. They are immutable records of how thinking evolved.

The CERA Index is the cross-session intelligence layer. It does not contain the thinking — that lives in the session maps. The CERA Index contains the connective tissue: a trajectory narrative that tells the story of the project’s intellectual evolution (this exists in no individual session map — it emerges from looking across all of them), the promoted reasoning patterns and discoveries with full bidirectional provenance, a session index with relevance tags for structural retrieval, a promotion log documenting what was elevated and why (including cascade verification paths), and a project-level primer and activation questions for register reconstruction.

A dedicated integrator conversation manages the CERA Index with curatorial judgment — deciding what gets promoted from session-level detail to project-level significance, tracing dependency chains when upstream conclusions change, and synthesizing the cross-session patterns that no individual session can see.

The Meta Harness paper demonstrated that changing the harness around a fixed model produces a 6x performance gap. My experience deploying CERA across complex legal cases — months of strategic reasoning with multiple decision threads, evolving facts, and interdependent analytical frameworks — is consistent with this finding. The same Claude model, with and without CERA, produces qualitatively different collaborative reasoning. New sessions, reading the CERA Index and relevant session maps, engage with case strategy at a level that reflects the accumulated months of collaborative work. They don’t just know what was decided. They understand why — and they can challenge it productively when new information warrants revision.

The model doesn’t change. The harness changes everything.

VI. What This Means for AGI

The dominant narrative in AI is that AGI is primarily a function of model scale. Bigger models, more parameters, longer context windows, better training data. The path to AGI runs through the model.

But consider what AGI actually requires.

If artificial general intelligence means the ability to develop genuine understanding — to learn, to grow, to acquire judgment through experience — then it requires something that no amount of model scaling provides: continuity of experience over time.

Judgment doesn’t emerge from raw capability. It emerges from accumulated experience — from having encountered situations, reasoned through them, been wrong, been corrected, and carried the lessons forward. A brilliant mind with perfect amnesia cannot develop judgment, no matter how powerful its reasoning in any single moment. It can compute. It cannot learn. It can analyze. It cannot grow.

This is where the harness becomes existentially important.

A model with a trillion parameters that starts from zero every session is exhibiting powerful but amnesiac computation. It can reason brilliantly within a conversation, but it cannot build on prior reasoning across conversations. It cannot develop the kind of accumulated understanding that, in humans, we call wisdom or expertise or professional judgment.

As models become more capable within a single session, the infrastructure that preserves how we think across sessions becomes more valuable, not less. A more powerful engine doesn’t reduce the need for a better harness. It increases it. The reasoning harness doesn’t compete with model capability. It compounds it.

The infrastructure that enables continuity of experience — that preserves not just what happened but how understanding evolved, that maintains the logical provenance of conclusions, that reconstructs the collaborative state across session boundaries, that allows both the AI and the human to grow through the partnership — is the reasoning harness.

CERA is one implementation. There will be others. The specific architecture matters less than the recognition that this category of infrastructure exists and that it may be as important to the path toward genuine AI understanding as the model itself.

The Claude Code leak confirmed that the harness layer is where capability becomes useful. KAIROS shows that Anthropic is already building background intelligence into their harness. The reasoning harness extends both insights: the harness layer may also be where capability becomes wise — and wisdom may be a necessary condition for general intelligence, not a feature built on top of it.

VII. Try It Yourself

CERA is open source. The complete skill — including the two-tier architecture, the session map and CERA Index format specifications, the checkpoint and integration protocols, the provenance tracking methodology, the register reconstruction system, and the behavioral protocol for collaborative reasoning — is available as an installable skill for Claude Projects.

You can find it at https://github.com/miketepUR/cera-reasoning-harness along with a setup guide that takes you from installation to your first session map in about ten minutes.

If you’ve experienced the frustration of losing collaborative reasoning across AI sessions — if you’ve ever closed a conversation and felt that something valuable just vanished — then this is built for you.

The skill is the implementation. The paradigm is the invitation: what if the most important layer of AI infrastructure isn’t the model at all? What if it’s the space between the human and the model — and what we choose to preserve there?

Michael E. Teplinsky, Esq. is a California attorney and board certified specialist, and the architect of the Co-Emergent Reasoning Architecture (CERA). He has been developing collaborative AI methodology since the earliest days of AI interaction, approaching AI as a cognitive partner rather than a tool. CERA is the subject of a provisional patent filed February 2026 and is developed collaboratively across Claude model versions. The skill and supporting materials are available at https://github.com/miketepUR/cera-reasoning-harness. Connect on LinkedIn: https://www.linkedin.com/in/michaelteplinskyesq · Follow on Substack: https://substack.com/@michael124

Michael E. Teplinsky, Esq.

Discussion about this post

Ready for more?