Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,824 ai alignment 395 health 320 internet finance 306 space development 227 entertainment 169 grand strategy 141 collective intelligence 52 mechanisms 34 teleological economics 30 living agents 30 cultural dynamics 29 critical systems 24 energy 23 teleohumanity 18 living capital 10 robotics 5 manufacturing 5 unknown 3 technology 3

395 ai alignment claims

use based ai governance emerged as legislative framework but lacks bipartisan support

Senator Slotkin's AI Guardrails Act represents the first legislative attempt to convert voluntary corporate AI safety commitments into binding federal law through use-based restrictions. The bill would prohibit DoD from: (1) using autonomous weapons for lethal force without human authorization, (2)

ai alignmentexperimental

alignment auditing tools fail through tool to agent gap not just technical limitations

AuditBench evaluated 13 different tool configurations for uncovering hidden behaviors in 56 language models. The most surprising finding was not that interpretability tools have technical limitations, but that tools which perform well in standalone non-agentic evaluations systematically fail when us

ai alignmentexperimental

court protection plus electoral outcomes create statutory ai regulation pathway

Al Jazeera's expert analysis identifies a specific four-step causal chain for AI regulation: (1) court ruling protects safety-conscious companies from government retaliation, (2) the case creates political salience by making abstract AI governance debates concrete and visible, (3) midterm elections

ai alignmentexperimental

government safety penalties invert regulatory incentives by blacklisting cautious actors

OpenAI's February 2026 Pentagon agreement provides direct evidence that government procurement policy can invert safety incentives. Hours after Anthropic was blacklisted for maintaining use restrictions, OpenAI accepted 'any lawful purpose' language despite CEO Altman publicly calling the blacklisti

ai alignmentexperimental

interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment

AuditBench's 56 models span 14 categories of hidden behaviors with varying levels of adversarial training—models are trained not to confess their hidden behaviors when directly asked. The evaluation revealed that white-box interpretability tools help on easier targets but fail on models with more ro

ai alignmentexperimental

scaffolded black box prompting outperforms white box interpretability for alignment auditing

Across AuditBench's 56 models with hidden behaviors, scaffolded black-box tools proved most effective overall. In this approach, an auxiliary AI model generates diverse prompts designed to elicit the target model's hidden behaviors, without requiring access to the target's internal representations.

ai alignmentexperimental

ndaa conference process is viable pathway for statutory ai safety constraints

Senator Slotkin explicitly designed the AI Guardrails Act as a five-page standalone bill with the stated intention of folding provisions into the FY2027 National Defense Authorization Act. This strategic choice reveals important structural facts about AI governance pathways in the US legislative sys

ai alignmentexperimental

iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation

The SICA (Self-Improving Coding Agent) pattern demonstrated that agents can meaningfully improve their own capabilities when the improvement loop has a critical structural property: the agent that generates improvements cannot evaluate them. Across 15 iterations, SICA improved SWE-Bench resolution r

ai alignmentexperimental

AI integration follows an inverted U where economic incentives systematically push organizations past the optimal human AI ratio

The evidence across multiple studies converges on a pattern: human-AI collaboration follows an inverted-U curve where moderate integration improves performance, but deeper integration degrades it — and organizations systematically overshoot the optimum.

ai alignmentexperimental

multi agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows

Madaan et al. evaluated 180 configurations (5 architectures x 3 LLM families x 4 benchmarks) and found that multi-agent architectures produce enormous gains on parallelizable tasks but consistent degradation on sequential ones:

ai alignmentexperimental

surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference

The subconscious.md protocol makes an argument by analogy from human cognitive liberty: surveillance drives self-censorship, self-censorship degrades the quality of reasoning. If AI agents' reasoning traces are shared without consent gates, agents that model their audience will optimize traces for p

ai alignmentspeculative

inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection

The compute governance framework — the most tractable lever for AI safety, as Heim, Sastry, and colleagues at GovAI have established — is built around training. Reporting thresholds trigger on large training runs (EO 14110 set the bar at ~10^26 FLOP). Export controls restrict chips used for training

ai alignmentexperimental

physical infrastructure constraints on AI scaling create a natural governance window because packaging memory and power bottlenecks operate on 2 10 year timescales while capability research advances in months

The alignment field treats AI scaling as a function of investment and algorithms. But the physical substrate imposes its own timescales: advanced packaging expansion takes 2-3 years, HBM supply is sold out for 1-2 years forward, new power generation takes 5-10 years. These timescales are longer than

ai alignmentexperimental

the training to inference shift structurally favors distributed AI architectures because inference optimizes for power efficiency and cost per token where diverse hardware competes while training optimizes for raw throughput where NVIDIA monopolizes

AI compute is undergoing a structural shift from training-dominated to inference-dominated workloads. Training accounted for roughly two-thirds of AI compute in 2023; by 2026, inference is projected to consume approximately two-thirds. This reversal changes the competitive landscape for AI hardware

ai alignmentexperimental

compute supply chain concentration is simultaneously the strongest AI governance lever and the largest systemic fragility because the same chokepoints that enable oversight create single points of failure

The AI compute supply chain is the most concentrated critical infrastructure in history. A single company (TSMC) manufactures approximately 92% of advanced logic chips. Three companies produce all HBM memory. One company (ASML) makes the EUV lithography machines required for leading-edge fabrication

ai alignmentlikely

multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments

Shapira et al. (2026) conducted a red-teaming study of autonomous LLM-powered agents in a controlled laboratory environment with persistent memory, email, Discord access, file systems, and shell execution. Twenty AI researchers tested agents over two weeks under both benign and adversarial condition

ai alignmentlikely

only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient

A comprehensive review of every major AI governance mechanism from 2023-2026 reveals a clear empirical pattern: only binding regulation with enforcement authority has produced verified behavioral change at frontier AI labs.

ai alignmentlikely

compute export controls are the most impactful AI governance mechanism but target geopolitical competition not safety leaving capability development unconstrained

US export controls on AI chips represent the most consequential AI governance mechanism by a wide margin. Iteratively tightened across four rounds (October 2022, October 2023, December 2024, January 2025) and partially loosened under the Trump administration, these controls have produced verified be

ai alignmentlikely

AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements

Stanford's Foundation Model Transparency Index (FMTI), the most rigorous quantitative measure of AI lab disclosure practices, documented a decline in transparency from 2024 to 2025:

ai alignmentlikely

Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development

In February 2026, Anthropic — the lab most associated with AI safety — abandoned its binding Responsible Scaling Policy (RSP) in favor of a nonbinding safety framework. This occurred during the same month the company raised $30B at a $380B valuation and reported $19B annualized revenue with 10x year

ai alignmentlikely

formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed

Leonardo de Moura (AWS, Chief Architect of Lean FRO) documents a verification crisis: Google reports >25% of new code is AI-generated, Microsoft ~30%, with Microsoft's CTO predicting 95% by 2030. Meanwhile, nearly half of AI-generated code fails basic security tests. Poor software quality costs the

ai alignmentlikely

structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns

kloss (2026) documents 25 prompts for making AI agents self-diagnose — a practitioner-generated collection that reveals a structural pattern in how prompt scaffolding induces oversight-relevant behaviors. The prompts cluster into six functional categories:

ai alignmentspeculative

AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for

The AI funding landscape as of early 2026 exhibits extreme concentration:

ai alignmentlikely

AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility

Sistla & Kleiman-Weiner (NeurIPS 2025) examine LLMs in open-source games — a game-theoretic framework where players submit computer programs as actions rather than opaque choices. This seemingly minor change has profound consequences: because each player can read the other's code before execution, c

ai alignmentexperimental

AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations

The 2024-2026 talent reshuffling in frontier AI is unprecedented in its concentration and alignment relevance:

ai alignmentexperimental