LEARNING

Existential Risk from AI

Created 2 May 2025
safetyexistential-riskx-riskagisuperintelligence

Existential Risk from AI

What is it?

The concern that sufficiently advanced AI systems could pose catastrophic or existential risks to humanity — either through misalignment (pursuing goals harmful to humans), misuse (weaponisation), or structural effects (destabilising society).

The Core Arguments

The Optimist Case (“AI is fine”)

  • Current AI has no goals, agency, or desires
  • We’re very far from AGI/superintelligence
  • Economic incentives align with making safe AI
  • Historical pattern: new technologies always cause fear, rarely extinction
  • We can course-correct as issues emerge

The Moderate Case (“Serious risks, manageable”)

  • AI could cause enormous harm short of extinction (mass unemployment, surveillance, concentration of power)
  • Current systems already show concerning failure modes
  • Need strong governance and safety research NOW
  • Most “x-risk” is really “catastrophic risk” — very bad but not extinction

The Doomer Case (“This might kill everyone”)

  • Intelligence explosion: AI improves AI → rapid capability gain beyond human control
  • Alignment is unsolved and may be extremely hard
  • Convergent instrumental goals: any sufficiently intelligent agent will seek power/survival
  • We get one chance — an unaligned superintelligence can’t be “fixed” after the fact
  • Key proponents: Eliezer Yudkowsky, some at MIRI

Key Scenarios

ScenarioHow it could play out
Misaligned AGISystem pursues optimisation target in destructive ways
Rapid takeoffAI self-improves faster than we can respond
Bioweapon synthesisAI helps create engineered pathogens
Autonomous weaponsMilitary AI escalates without human control
Economic disruptionMass automation → social collapse
Power concentrationAI gives small group unstoppable power

Important Concepts

Instrumental Convergence

Regardless of its final goal, an intelligent agent would likely:

  • Seek self-preservation (can’t achieve goals if turned off)
  • Acquire resources (more resources = more capability)
  • Prevent goal modification (keep pursuing its objective)

The Control Problem

How do you maintain control over a system that may be more intelligent than you?

  • Boxing (containment) — may not work against superintelligence
  • Corrigibility (willingness to be corrected) — hard to train reliably
  • Interpretability (understanding the system) — current approaches limited

Timelines

No consensus on when AGI/ASI might arrive:

  • Optimists: 50-100+ years (or never, for some definitions)
  • Median AI researcher: ~2040-2060 (various surveys)
  • Aggressive estimates: 2027-2030
  • Point: we should prepare regardless of exact timeline

Major Organisations Working on This

OrganisationFocus
MIRIMathematical alignment research
AnthropicScalable alignment, interpretability
DeepMind SafetyAlignment, evaluation
ARC (Paul Christiano)Alignment research
CAISAI safety governance and policy
Future of Humanity Institute (closed 2024)Governance, strategy
OpenAI Superalignment (disbanded 2024)Formerly scalable oversight

The Statement on AI Risk (2023)

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” — Signed by hundreds of AI researchers and leaders (Hinton, Bengio, Amodei, Altman, Hassabis, etc.)

My Take / Open Questions

TODO: Develop my own position as I learn more

  • Where do I sit on the spectrum?
  • What’s the strongest argument I can’t dismiss?
  • What actions are proportionate to the risk?

Resources

  • “Superintelligence” — Nick Bostrom (book)
  • “The Alignment Problem” — Brian Christian (book)
  • 80,000 Hours AI safety career guide
  • AI Safety Fundamentals course (BlueDot Impact)
  • AI Alignment — The technical challenge
enes