Rendered at 13:28:10 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
milkkarten 2 days ago [-]
Author here. TL;DR:
Long-horizon embodied agency is a harness problem, not a model-scale problem. Coding agents like Claude Code work because of scaffolding (prompt, skills, memory, sub-agents) around the model. Embodied agents haven't had an equivalent.
Gemini Plays Pokémon (GPP) became the first AI to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle via iterative harness refinement. Early on a human edited the harness. By Crystal the model was doing it itself by naming its own strategies, writing truth tables for puzzles, wrapping loopholes into reusable primitives.
Continual Harness automates this fully. Starting from a raw interface with no curated knowledge, every F steps a Refiner reads the recent trajectory and applies edits to the prompt, sub-agents, skills, and memory -- no resets. It closes most of the gap to a hand-engineered expert harness from scratch.
Our key findings:
(1) Iterative harness refinement closes most of the gap to a hand-engineered version.
(2) Long-horizon agency requires self-refinement, and self-refinement requires a useful model.
(3) The future of agents is model-harness co-learning.
Long-horizon embodied agency is a harness problem, not a model-scale problem. Coding agents like Claude Code work because of scaffolding (prompt, skills, memory, sub-agents) around the model. Embodied agents haven't had an equivalent.
Gemini Plays Pokémon (GPP) became the first AI to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle via iterative harness refinement. Early on a human edited the harness. By Crystal the model was doing it itself by naming its own strategies, writing truth tables for puzzles, wrapping loopholes into reusable primitives.
Continual Harness automates this fully. Starting from a raw interface with no curated knowledge, every F steps a Refiner reads the recent trajectory and applies edits to the prompt, sub-agents, skills, and memory -- no resets. It closes most of the gap to a hand-engineered expert harness from scratch.
Our key findings: (1) Iterative harness refinement closes most of the gap to a hand-engineered version. (2) Long-horizon agency requires self-refinement, and self-refinement requires a useful model. (3) The future of agents is model-harness co-learning.
Demos: https://sethkarten.ai/continual-harness