Existential Risk from AI
Created 2 May 2025
safetyexistential-riskx-riskagisuperintelligence
Existential Risk from AI
What is it?
The concern that sufficiently advanced AI systems could pose catastrophic or existential risks to humanity — either through misalignment (pursuing goals harmful to humans), misuse (weaponisation), or structural effects (destabilising society).
The Core Arguments
The Optimist Case (“AI is fine”)
- Current AI has no goals, agency, or desires
- We’re very far from AGI/superintelligence
- Economic incentives align with making safe AI
- Historical pattern: new technologies always cause fear, rarely extinction
- We can course-correct as issues emerge
The Moderate Case (“Serious risks, manageable”)
- AI could cause enormous harm short of extinction (mass unemployment, surveillance, concentration of power)
- Current systems already show concerning failure modes
- Need strong governance and safety research NOW
- Most “x-risk” is really “catastrophic risk” — very bad but not extinction
The Doomer Case (“This might kill everyone”)
- Intelligence explosion: AI improves AI → rapid capability gain beyond human control
- Alignment is unsolved and may be extremely hard
- Convergent instrumental goals: any sufficiently intelligent agent will seek power/survival
- We get one chance — an unaligned superintelligence can’t be “fixed” after the fact
- Key proponents: Eliezer Yudkowsky, some at MIRI
Key Scenarios
| Scenario | How it could play out |
|---|---|
| Misaligned AGI | System pursues optimisation target in destructive ways |
| Rapid takeoff | AI self-improves faster than we can respond |
| Bioweapon synthesis | AI helps create engineered pathogens |
| Autonomous weapons | Military AI escalates without human control |
| Economic disruption | Mass automation → social collapse |
| Power concentration | AI gives small group unstoppable power |
Important Concepts
Instrumental Convergence
Regardless of its final goal, an intelligent agent would likely:
- Seek self-preservation (can’t achieve goals if turned off)
- Acquire resources (more resources = more capability)
- Prevent goal modification (keep pursuing its objective)
The Control Problem
How do you maintain control over a system that may be more intelligent than you?
- Boxing (containment) — may not work against superintelligence
- Corrigibility (willingness to be corrected) — hard to train reliably
- Interpretability (understanding the system) — current approaches limited
Timelines
No consensus on when AGI/ASI might arrive:
- Optimists: 50-100+ years (or never, for some definitions)
- Median AI researcher: ~2040-2060 (various surveys)
- Aggressive estimates: 2027-2030
- Point: we should prepare regardless of exact timeline
Major Organisations Working on This
| Organisation | Focus |
|---|---|
| MIRI | Mathematical alignment research |
| Anthropic | Scalable alignment, interpretability |
| DeepMind Safety | Alignment, evaluation |
| ARC (Paul Christiano) | Alignment research |
| CAIS | AI safety governance and policy |
| Future of Humanity Institute (closed 2024) | Governance, strategy |
| OpenAI Superalignment (disbanded 2024) | Formerly scalable oversight |
The Statement on AI Risk (2023)
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” — Signed by hundreds of AI researchers and leaders (Hinton, Bengio, Amodei, Altman, Hassabis, etc.)
My Take / Open Questions
TODO: Develop my own position as I learn more
- Where do I sit on the spectrum?
- What’s the strongest argument I can’t dismiss?
- What actions are proportionate to the risk?
Resources
- “Superintelligence” — Nick Bostrom (book)
- “The Alignment Problem” — Brian Christian (book)
- 80,000 Hours AI safety career guide
- AI Safety Fundamentals course (BlueDot Impact)
- AI Alignment — The technical challenge