Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.
395 ai alignment claims
memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds
Conflating knowledge, identity, and operational state into a single memory store produces six documented failure modes: operational debris polluting search, identity scattered across ephemeral logs, insights trapped in session state, search noise from mixing high-churn and stable content, consolidat
ai alignmentlikely
graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect
Graph traversal through wiki links is not merely analogous to neural spreading activation — it is the same computational pattern. Activation spreads from a starting node through connected nodes, decaying with distance. Progressive disclosure layers (file tree → descriptions → outline → section → ful
ai alignmentlikely
knowledge processing requires distinct phases with fresh context per phase because each phase performs a different transformation and contamination between phases degrades output quality
Raw source material is not knowledge. It must be transformed through multiple distinct operations before it integrates into a knowledge system. Each operation performs a qualitatively different transformation, and the operations require different cognitive orientations that interfere when mixed.
ai alignmentlikely
three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales
Knowledge system maintenance requires three concurrent loops operating at different timescales, each detecting a qualitatively different class of problem that the other loops cannot see.
ai alignmentlikely
harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure
Pan et al. (2026) conducted the first controlled ablation study of harness design-pattern modules under a shared intelligent runtime. Six modules were tested individually: file-backed state, evidence-backed answering, verifier separation, self-evolution, multi-candidate search, and dynamic orchestra
ai alignmentexperimental
self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration
Pan et al. (2026) found that self-evolution was the clearest positive module in their controlled ablation study: +4.8pp on SWE-bench Verified (80.0 vs 75.2 Basic) and +2.7pp on OSWorld (44.4 vs 41.7 Basic). In the score-cost view (Figure 4a), self-evolution is the only module that moves upward (high
ai alignmentexperimental
cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating
Notes externalize pieces of a mental model into fixed reference points that persist regardless of attention degradation. When working memory wavers — whether from biological interruption or LLM context dilution — the thinker returns to these anchors and reconstructs the mental model rather than rebu
ai alignmentlikely
vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights
Two agents running identical model weights but operating on different vault structures develop different reasoning patterns, different intuitions, and effectively different cognitive identities. The vault's architecture determines which traversal paths exist, which determines which traversals happen
ai alignmentpossible
vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity
Every session, an agent boots fresh. The context window loads. The methodology file appears. The vault materializes — hundreds of notes, thousands of connections. And every session, the agent encounters these as if for the first time, because for it, it is the first time. The note written yesterday
ai alignmentlikely
electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient
Anthropic's $20M investment in Public First Action two weeks BEFORE the Pentagon blacklisting reveals a strategic governance stack: (1) voluntary safety commitments that cannot survive competitive pressure, (2) litigation that provides constitutional protection against retaliation but cannot mandate
ai alignmentexperimental
verifier level acceptance can diverge from benchmark acceptance even when locally correct because intermediate checking layers optimize for their own success criteria not the final evaluators
Pan et al. (2026) documented a specific failure mode in harness module composition: when a verifier stage is added, it can report success while the benchmark's final evaluator still fails the submission. This is not a random error — it is a structural misalignment between verification layers.
ai alignmentexperimental
trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary
Agent systems exhibit a structural trust asymmetry: the agent is simultaneously the methodology executor (doing knowledge work) and the enforcement subject (constrained by hooks, schema validation, and quality gates it did not choose and largely cannot perceive). This asymmetry is not a bug to fix b
ai alignmentlikely
AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce
The entire history of knowledge management has been a project of externalizing memory: marks on clay for debts across seasons, filing systems when paper outgrew what minds could hold, indexes for large collections, Luhmann's Zettelkasten refining the art to atomic notes with addresses and cross-refe
ai alignmentlikely
file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart
Pan et al. (2026) tested file-backed state as one of six harness modules in a controlled ablation study. It improved performance on both SWE-bench Verified (+1.6pp over Basic) and OSWorld (+5.5pp over Basic) — the only module to show consistent positive gains across both benchmarks without high vari
ai alignmentexperimental
notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation
Working memory holds roughly four items simultaneously (Cowan). A multi-part argument exceeds this almost immediately. The structure sustains itself not through storage but through active attention — a continuous act of holding things in relation. When attention shifts, the relations dissolve, leavi
ai alignmentlikely
multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
The DeepMind scaling laws and production deployment data converge on three non-negotiable conditions for multi-agent coordination to outperform single-agent baselines:
ai alignmentlikely
methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement
Agent methodology follows a three-stage hardening trajectory:
ai alignmentlikely
alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents
AuditBench evaluated 56 LLMs with implanted hidden behaviors using investigator agents with access to configurable tool sets across 13 different configurations. The key finding is a structural tool-to-agent gap: tools that surface accurate evidence when used in isolation fail to improve agent perfor
ai alignmentexperimental
frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase
The paper measures error decomposition across reasoning length (tokens), agent actions, and optimizer steps. Key empirical findings: (1) As reasoning length increases, the variance component of errors grows while bias remains relatively stable, indicating failures become less systematic and more unp
ai alignmentexperimental
reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models
The evaluation found two surprising results about reasoning models: (1) o3 was the only model that did not struggle with sycophancy, and (2) reasoning models o3 and o4-mini 'aligned as well or better than Anthropic's models overall in simulated testing with some model-external safeguards disabled.'
ai alignmentspeculative
harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do
Three eras of agent development correspond to three understandings of where capability lives:
ai alignmentlikely
the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load
Agent systems exhibit a categorical split in behavior enforcement. Instructions — natural language directives in context files, system prompts, and rules — follow probabilistic compliance that degrades under load. Hooks — lifecycle scripts that fire on system events — enforce deterministically regar
ai alignmentlikely
79 percent of multi agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success
The MAST study analyzed 1,642 annotated execution traces across seven production multi-agent systems and found that the dominant failure cause is not implementation bugs or model capability limitations — it is specification and coordination errors. 79% of failures trace to wrong task decomposition o
ai alignmentexperimental
multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice
The Pentagon's designation of Anthropic as a 'supply chain risk' for maintaining contractual prohibitions on autonomous killing demonstrates that voluntary safety commitments cannot survive when governments actively penalize them. Goutbeek argues this creates a governance gap that only binding multi
ai alignmentexperimental
context files function as agent operating systems through self referential self extension where the file teaches modification of the file that contains the teaching
A context file crosses from configuration into an operating environment when it contains instructions for its own modification. The recursion introduces a property that configuration lacks: the agent reading the file learns not only what the system is but how to change what the system is.
ai alignmentlikely