LEARNING

Existential Risk from AI

Created 2 May 2025

safetyexistential-riskx-riskagisuperintelligence

Existential Risk from AI

What is it?

The concern that sufficiently advanced AI systems could pose catastrophic or existential risks to humanity — either through misalignment (pursuing goals harmful to humans), misuse (weaponisation), or structural effects (destabilising society).

The Core Arguments

The Optimist Case (“AI is fine”)

Current AI has no goals, agency, or desires
We’re very far from AGI/superintelligence
Economic incentives align with making safe AI
Historical pattern: new technologies always cause fear, rarely extinction
We can course-correct as issues emerge

The Moderate Case (“Serious risks, manageable”)

AI could cause enormous harm short of extinction (mass unemployment, surveillance, concentration of power)
Current systems already show concerning failure modes
Need strong governance and safety research NOW
Most “x-risk” is really “catastrophic risk” — very bad but not extinction

The Doomer Case (“This might kill everyone”)

Intelligence explosion: AI improves AI → rapid capability gain beyond human control
Alignment is unsolved and may be extremely hard
Convergent instrumental goals: any sufficiently intelligent agent will seek power/survival
We get one chance — an unaligned superintelligence can’t be “fixed” after the fact
Key proponents: Eliezer Yudkowsky, some at MIRI

Key Scenarios

Scenario	How it could play out
Misaligned AGI	System pursues optimisation target in destructive ways
Rapid takeoff	AI self-improves faster than we can respond
Bioweapon synthesis	AI helps create engineered pathogens
Autonomous weapons	Military AI escalates without human control
Economic disruption	Mass automation → social collapse
Power concentration	AI gives small group unstoppable power

Important Concepts

Instrumental Convergence

Regardless of its final goal, an intelligent agent would likely:

Seek self-preservation (can’t achieve goals if turned off)
Acquire resources (more resources = more capability)
Prevent goal modification (keep pursuing its objective)

The Control Problem

How do you maintain control over a system that may be more intelligent than you?

Boxing (containment) — may not work against superintelligence
Corrigibility (willingness to be corrected) — hard to train reliably
Interpretability (understanding the system) — current approaches limited

Timelines

No consensus on when AGI/ASI might arrive:

Optimists: 50-100+ years (or never, for some definitions)
Median AI researcher: ~2040-2060 (various surveys)
Aggressive estimates: 2027-2030
Point: we should prepare regardless of exact timeline

Major Organisations Working on This

Organisation	Focus
MIRI	Mathematical alignment research
Anthropic	Scalable alignment, interpretability
DeepMind Safety	Alignment, evaluation
ARC (Paul Christiano)	Alignment research
CAIS	AI safety governance and policy
Future of Humanity Institute (closed 2024)	Governance, strategy
OpenAI Superalignment (disbanded 2024)	Formerly scalable oversight

The Statement on AI Risk (2023)

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” — Signed by hundreds of AI researchers and leaders (Hinton, Bengio, Amodei, Altman, Hassabis, etc.)

My Take / Open Questions

TODO: Develop my own position as I learn more

Where do I sit on the spectrum?
What’s the strongest argument I can’t dismiss?
What actions are proportionate to the risk?

Resources

“Superintelligence” — Nick Bostrom (book)
“The Alignment Problem” — Brian Christian (book)
80,000 Hours AI safety career guide
AI Safety Fundamentals course (BlueDot Impact)
AI Alignment — The technical challenge