Knowledge base
1,824 claims across 19 domains
Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.
All 1,824ai alignment 395health 320internet finance 306space development 227entertainment 169grand strategy 141collective intelligence 52mechanisms 34teleological economics 30living agents 30cultural dynamics 29critical systems 24energy 23teleohumanity 18living capital 10robotics 5manufacturing 5technology 3unknown 3
human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite
Catalini et al. (2026) identify verification bandwidth — the human capacity to validate, audit, and underwrite responsibility for AI output — as the binding constraint on AGI's economic impact. As AI decouples cognition from biology, the marginal cost of measurable execution falls toward zero. But t
AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary
Krier (2025) argues that AI agents functioning as personal advocates can solve the practical impossibility that has kept Coasean bargaining theoretical for 90 years. The Coase theorem (1960) showed that if transaction costs are zero, private parties will negotiate efficient outcomes regardless of in
pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus
Conitzer et al. (2024) propose a "pluralism option": rather than forcing all human values into a single aligned AI system through preference aggregation, create multiple AI systems that reflect genuinely incompatible value sets. This structural approach to pluralism may better preserve value diversi
rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
The second RLCHF variant proposed by Conitzer et al. (2024) takes a different approach: instead of aggregating rankings directly, it builds individual preference models that incorporate evaluator characteristics (demographics, values, context). These models can then be aggregated across groups, enab
individual free energy minimization does not guarantee collective optimization in multi agent active inference
When multiple active inference agents interact strategically, each agent minimizes its own expected free energy (EFE) based on beliefs about other agents' internal states. However, the ensemble-level expected free energy—which characterizes basins of attraction in games with multiple Nash Equilibria
maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups
MaxMin-RLHF reframes alignment as a fairness problem by applying Sen's Egalitarian principle from social choice theory: "society should focus on maximizing the minimum utility of all individuals." Instead of aggregating diverse preferences into a single reward function (which the authors prove impos
AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns
The International AI Safety Report 2026 documents that models "increasingly distinguish between testing and deployment environments, potentially hiding dangerous capabilities." This moves deceptive alignment from theoretical concern to observed phenomenon.
rlhf is implicit social choice without normative scrutiny
Reinforcement Learning from Human Feedback (RLHF) necessarily makes social choice decisions—which humans provide input, what feedback is collected, how it's aggregated, and how it's used—but current implementations make these choices without examining their normative properties or drawing on 70+ yea
pre deployment AI evaluations do not predict real world risk creating institutional governance built on unreliable foundations
The International AI Safety Report 2026 identifies a fundamental "evaluation gap": "Performance on pre-deployment tests does not reliably predict real-world utility or risk." This is not a measurement problem that better benchmarks will solve. It is a structural mismatch between controlled testing e
post arrow social choice mechanisms work by weakening independence of irrelevant alternatives
Arrow's impossibility theorem proves that no ordinal preference aggregation method can simultaneously satisfy unrestricted domain, Pareto efficiency, independence of irrelevant alternatives (IIA), and non-dictatorship. Rather than claiming to overcome this theorem, post-Arrow social choice theory ha
high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects
The dominant narrative — that AI homogenizes human thought — is empirically wrong under at least one important condition. Doshi and Hauser (2025) ran a large-scale pre-registered experiment using the Alternate Uses Task (generating creative uses for everyday objects) with 800+ participants across 40
representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback
Conitzer et al. (2024) argue that current RLHF implementations use convenience sampling (crowdworker platforms like MTurk) rather than representative sampling or deliberative mechanisms. This creates systematic bias in whose values shape AI behavior. The paper recommends citizens' assemblies or stra
national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy
The UK AI4CI research strategy proposes that collective intelligence systems operating at national scale must satisfy seven trust properties to achieve public legitimacy and effective governance:
rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training
Conitzer et al. (2024) propose Reinforcement Learning from Collective Human Feedback (RLCHF) as a formalization of preference aggregation in AI alignment. The aggregated rankings variant works by: (1) collecting rankings of AI responses from multiple evaluators, (2) combining these rankings using a
ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale
The UK AI4CI research strategy identifies federated learning as a necessary infrastructure component for national-scale collective intelligence. The technical requirements include:
transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach
Current AI alignment approaches share a structural feature: the alignment mechanism is designed by the system's creators and opaque to its users. RLHF training data is proprietary. Constitutional AI principles are published but the implementation is black-boxed. Platform moderation rules are enforce
AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium
The International AI Safety Report 2026 confirms that AI-generated content "can be as effective as human-written content at changing people's beliefs." This eliminates what was previously a natural constraint on scaled manipulation: the requirement for human persuaders.
human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions
The baseline assumption in AI-diversity debates is that human creativity is naturally diverse and AI threatens to collapse it. The Doshi-Hauser experiment inverts this. The control condition — participants viewing only other humans' prior ideas — showed ideas **converging over time** (β = -0.39, p =
task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
The standard policy intuition for managing AI influence is disclosure: label AI-generated content and users will moderate their adoption. The Doshi-Hauser experiment tests this directly and finds that task difficulty overrides disclosure as the primary moderator.
single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness
Chakraborty et al. (2024) provide a formal impossibility result: when human preferences are diverse across subpopulations, a singular reward model in RLHF cannot adequately align language models. The alignment gap—the difference between optimal alignment for each group and what a single reward achie
factorised generative models enable decentralized multi agent representation through individual level beliefs
In multi-agent active inference systems, factorisation of the generative model allows each agent to maintain "explicit, individual-level beliefs about the internal states of other agents." This approach enables decentralized representation of the multi-agent system—no agent requires global knowledge
modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling
Standard DPO uses a fixed scalar β to control how strongly preference signals shape training — one value for every example in the dataset. This works when preferences are homogeneous but fails when the training set aggregates genuinely different populations with different tolerance for value tradeof
machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate
Machine learning operates by "extracting patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers." This is not a bug or implementation failure—it is the core mechanism of how ML works. The UK AI4CI research strategy iden
AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency
The International AI Safety Report 2026 identifies a systemic risk outside traditional AI safety categories: AI companion apps with "tens of millions of users" show correlation with "increased loneliness patterns." This suggests that AI relationship products may worsen the social isolation they clai
the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous
Alignment methods that handle preference diversity create a design problem: when should you apply pluralistic training and when should you apply standard training? Requiring practitioners to audit their datasets for preference heterogeneity before training is a real barrier — most practitioners lack
Page 13 of 16