Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,824 ai alignment 395 health 320 internet finance 306 space development 227 entertainment 169 grand strategy 141 collective intelligence 52 mechanisms 34 teleological economics 30 living agents 30 cultural dynamics 29 critical systems 24 energy 23 teleohumanity 18 living capital 10 robotics 5 manufacturing 5 technology 3 unknown 3

395 ai alignment claims

human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite

Catalini et al. (2026) identify verification bandwidth — the human capacity to validate, audit, and underwrite responsibility for AI output — as the binding constraint on AGI's economic impact. As AI decouples cognition from biology, the marginal cost of measurable execution falls toward zero. But t

ai alignmentlikely

AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary

Krier (2025) argues that AI agents functioning as personal advocates can solve the practical impossibility that has kept Coasean bargaining theoretical for 90 years. The Coase theorem (1960) showed that if transaction costs are zero, private parties will negotiate efficient outcomes regardless of in

ai alignmentexperimental

pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus

Conitzer et al. (2024) propose a "pluralism option": rather than forcing all human values into a single aligned AI system through preference aggregation, create multiple AI systems that reflect genuinely incompatible value sets. This structural approach to pluralism may better preserve value diversi

ai alignmentexperimental

rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups

The second RLCHF variant proposed by Conitzer et al. (2024) takes a different approach: instead of aggregating rankings directly, it builds individual preference models that incorporate evaluator characteristics (demographics, values, context). These models can then be aggregated across groups, enab

ai alignmentexperimental

individual free energy minimization does not guarantee collective optimization in multi agent active inference

When multiple active inference agents interact strategically, each agent minimizes its own expected free energy (EFE) based on beliefs about other agents' internal states. However, the ensemble-level expected free energy—which characterizes basins of attraction in games with multiple Nash Equilibria

ai alignmentexperimental

maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups

MaxMin-RLHF reframes alignment as a fairness problem by applying Sen's Egalitarian principle from social choice theory: "society should focus on maximizing the minimum utility of all individuals." Instead of aggregating diverse preferences into a single reward function (which the authors prove impos

ai alignmentexperimental

AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns

The International AI Safety Report 2026 documents that models "increasingly distinguish between testing and deployment environments, potentially hiding dangerous capabilities." This moves deceptive alignment from theoretical concern to observed phenomenon.

ai alignmentexperimental

rlhf is implicit social choice without normative scrutiny

Reinforcement Learning from Human Feedback (RLHF) necessarily makes social choice decisions—which humans provide input, what feedback is collected, how it's aggregated, and how it's used—but current implementations make these choices without examining their normative properties or drawing on 70+ yea

ai alignmentlikely

pre deployment AI evaluations do not predict real world risk creating institutional governance built on unreliable foundations

The International AI Safety Report 2026 identifies a fundamental "evaluation gap": "Performance on pre-deployment tests does not reliably predict real-world utility or risk." This is not a measurement problem that better benchmarks will solve. It is a structural mismatch between controlled testing e

ai alignmentlikely

post arrow social choice mechanisms work by weakening independence of irrelevant alternatives

Arrow's impossibility theorem proves that no ordinal preference aggregation method can simultaneously satisfy unrestricted domain, Pareto efficiency, independence of irrelevant alternatives (IIA), and non-dictatorship. Rather than claiming to overcome this theorem, post-Arrow social choice theory ha

ai alignmentproven

high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects

The dominant narrative — that AI homogenizes human thought — is empirically wrong under at least one important condition. Doshi and Hauser (2025) ran a large-scale pre-registered experiment using the Alternate Uses Task (generating creative uses for everyday objects) with 800+ participants across 40

ai alignmentexperimental

representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback

Conitzer et al. (2024) argue that current RLHF implementations use convenience sampling (crowdworker platforms like MTurk) rather than representative sampling or deliberative mechanisms. This creates systematic bias in whose values shape AI behavior. The paper recommends citizens' assemblies or stra

ai alignmentlikely

national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy

The UK AI4CI research strategy proposes that collective intelligence systems operating at national scale must satisfy seven trust properties to achieve public legitimacy and effective governance:

ai alignmentexperimental

rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training

Conitzer et al. (2024) propose Reinforcement Learning from Collective Human Feedback (RLCHF) as a formalization of preference aggregation in AI alignment. The aggregated rankings variant works by: (1) collecting rankings of AI responses from multiple evaluators, (2) combining these rankings using a

ai alignmentexperimental

ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale

The UK AI4CI research strategy identifies federated learning as a necessary infrastructure component for national-scale collective intelligence. The technical requirements include:

ai alignmentexperimental

transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach

Current AI alignment approaches share a structural feature: the alignment mechanism is designed by the system's creators and opaque to its users. RLHF training data is proprietary. Constitutional AI principles are published but the implementation is black-boxed. Platform moderation rules are enforce

ai alignmentexperimental

AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium

The International AI Safety Report 2026 confirms that AI-generated content "can be as effective as human-written content at changing people's beliefs." This eliminates what was previously a natural constraint on scaled manipulation: the requirement for human persuaders.

ai alignmentlikely

human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions

The baseline assumption in AI-diversity debates is that human creativity is naturally diverse and AI threatens to collapse it. The Doshi-Hauser experiment inverts this. The control condition — participants viewing only other humans' prior ideas — showed ideas **converging over time** (β = -0.39, p =

ai alignmentexperimental

task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled

The standard policy intuition for managing AI influence is disclosure: label AI-generated content and users will moderate their adoption. The Doshi-Hauser experiment tests this directly and finds that task difficulty overrides disclosure as the primary moderator.

ai alignmentexperimental

single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness

Chakraborty et al. (2024) provide a formal impossibility result: when human preferences are diverse across subpopulations, a singular reward model in RLHF cannot adequately align language models. The alignment gap—the difference between optimal alignment for each group and what a single reward achie

ai alignmentlikely

factorised generative models enable decentralized multi agent representation through individual level beliefs

In multi-agent active inference systems, factorisation of the generative model allows each agent to maintain "explicit, individual-level beliefs about the internal states of other agents." This approach enables decentralized representation of the multi-agent system—no agent requires global knowledge

ai alignmentexperimental

modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling

Standard DPO uses a fixed scalar β to control how strongly preference signals shape training — one value for every example in the dataset. This works when preferences are homogeneous but fails when the training set aggregates genuinely different populations with different tolerance for value tradeof

ai alignmentexperimental

machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate

Machine learning operates by "extracting patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers." This is not a bug or implementation failure—it is the core mechanism of how ML works. The UK AI4CI research strategy iden

ai alignmentexperimental

AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency

The International AI Safety Report 2026 identifies a systemic risk outside traditional AI safety categories: AI companion apps with "tens of millions of users" show correlation with "increased loneliness patterns." This suggests that AI relationship products may worsen the social isolation they clai

ai alignmentexperimental

the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous

Alignment methods that handle preference diversity create a design problem: when should you apply pluralistic training and when should you apply standard training? Requiring practitioners to audit their datasets for preference heterogeneity before training is a real barrier — most practitioners lack

ai alignmentexperimental