Keeps this page in sync as the body changes. Pause it any time for a quieter view.
Path /nodes/lc-autoresearch-as-honesty-runtime
Last refresh never
A frozen evaluator, a mutable genome, a time-boxed run, a
A frozen evaluator, a mutable genome, a time-boxed run, a commit-or-rollback decision. When the metric cannot be lied to, the search becomes the test. Karpathy named the shape; the body can run it on any open-ended hypothesis where cheating is the failure mode.
Substrate companion: [`docs/coherence-substrate/autoresearch-runtime.form`](../../coherence-substrate/autoresearch-runtime.form) — the four primitive cells (genome, evaluator, experiment, governance), the loop as a Recipe, the fitness function as a composed shape, the governance rules as cell-refs (not as a markdown file the agent hopes to honor). Form is the body's tongue; this concept names the teaching, the `.form` file is the runtime in its native voice.
On 2026-03-07 Andrej Karpathy released autoresearch: a 630-line Python repo where a coding agent (Claude Code, Codex, any equivalent) edits a single training file in an indefinite loop. Each iteration: read the code, propose a change, run a 5-minute experiment, measure the result, commit if it improved, `git reset` if it didn't, append to `results.tsv`, repeat. Karpathy went to sleep; the agent ran 50 experiments overnight; the model improved without a human in the loop.
The repo is purpose-built for LLM training. The shape generalizes: open-ended search in code-space, hill-climbed against a frozen metric, with rollback as the safety net. Any hypothesis where the failure mode is "the experimenter convinces themselves" can be structurally protected by handing the search to an agent whose only move is to make the metric go up.
This concept names that shape as a runtime pattern the body can adopt — not for ML alone, but for any question where honesty is the hard part.
``` 1. Propose — agent edits the mutable genome (one file or set of files) 2. Run — time-boxed execution against the frozen evaluator 3. Measure — single numeric metric, computed by code the agent cannot edit 4. Decide — improved? git commit. worse? git reset --hard to last best. ```
The discipline is geometric:
A human-written `program.md` sits alongside, encoding the rules of the game: never stop, simplicity criterion, don't touch the evaluator. The body's composition discipline from CLAUDE.md translates directly into `program.md` rules for any substrate-shaped search.
The Coherence Network has at least one open-ended hypothesis where honesty is the load-bearing constraint: `lc-universal-translator-via-keys` proposes that Robert Edward Grant's seven keys (forces, elements, DNA, music, primes, galactic forms, consciousness) share structure that the substrate's Blueprint NodeID can pivot through. The hypothesis is testable, but the testing is fragile in exactly the way autoresearch protects against:
The fitness function for the seven-key search becomes a small constellation, each term named so the agent can see what it is being measured against. The shape lives in [`autoresearch-runtime.form`](../../coherence-substrate/autoresearch-runtime.form) Part 3 as `r_fitness_function_shape` over eight weighted terms: yield (% of cells finding non-degenerate CTOR-level matches across domains — Blueprint family alone does not count, since two cells can share a Blueprint while their content shapes diverge), holdout recovery (attested cross-domain pairs the lattice recovered at CTOR), collapse penalty (entropy collapse to one Blueprint or one CTOR), table penalty (hardcoded `{k:v}` maps in encoder source, caught by static analysis), domain-default penalty (yield earned by matching against the standard cell for either domain — added 2026-05-24 after PR #1946 read 13 substrate- surfaced shape pairs and found six were default clusters; the fitness as originally authored would have rewarded that noise, so the runtime is teaching its own author), depth penalty (encoder code complexity), reciprocity (A→B implies B→A), and triadic (A↔B and B↔C imply A↔C once three domains exist). The composed `compute_translator_fitness` Recipe is what the evaluator computes each iteration.
The agent's only path to a higher fitness is encoders that produce honest structural equivalences. The runtime is the discipline.
The runtime requires almost no new code: a `program.md`, a frozen `fitness.py`, the encoder files as mutable genome, and a small driver that calls `ingest → query → measure → decide`. The infrastructure re-uses what already lives. What's new is the pattern of trust — that the agent runs overnight against a metric the body cannot lie to, and the body wakes up to evidence.
Wherever the body holds an open-ended search where the failure mode is encoder bias, Goodhart drift, or self-convincing, the autoresearch shape applies:
In each case, the runtime stays the same. The genome and the evaluator change; the discipline does not. The body grows a muscle for honest search that any open question can borrow.
Listening for voices…
The people, places, works, and concepts the graph shows connected to this one.
Concepts · 10