ARTICLE

AI Security

Updated 2 May 2025

safetysecuritythreatsdefensepractical

AI Security

The philosophical questions about AI alignment matter. But right now, today, people are being scammed by cloned voices, manipulated by deepfake videos, and having their systems compromised through prompt injection attacks.

This section is about practical AI security — the threats that exist right now, how they work, and what you can do about them. Whether you’re an individual protecting yourself, a business deploying AI, or a developer building with it, the attack surface is real and growing.

The Threat Landscape

flowchart TD
    Threats[AI Security Threats] --> People[Targeting People]
    Threats --> Systems[Targeting AI Systems]
    Threats --> Data[Targeting Data]

    People --> Deepfakes[Deepfakes]
    People --> Scams[AI Scams & Social Engineering]
    People --> Disinfo[Disinformation]

    Systems --> PromptInj[Prompt Injection]
    Systems --> Jailbreak[Jailbreaking]
    Systems --> ModelTheft[Model Theft & Extraction]

    Data --> Poisoning[Data Poisoning]
    Data --> Privacy[Privacy & PII Leakage]
    Data --> SupplyChain[Supply Chain Attacks]

Threats Targeting People

Deepfakes

AI-generated fake images, video, and audio of real people. Used for fraud, political manipulation, non-consensual content, and evidence fabrication. The technology to create them is accessible. The technology to detect them is lagging behind.

AI Scams & Social Engineering

Voice cloning for phone scams. AI-generated phishing emails that are grammatically perfect and personally tailored. Fake customer service chatbots. The scale and sophistication of social engineering has jumped dramatically.

Disinformation

AI makes it trivially easy to generate convincing fake news articles, social media posts, and propaganda at scale. The cost of creating disinformation has dropped to near zero.

Threats Targeting AI Systems

Prompt Injection

The most important AI security vulnerability. An attacker embeds instructions in data that the AI processes, causing it to ignore its real instructions and follow the attacker’s instead. If AI is processing any untrusted input — and it almost always is — prompt injection is a risk.

Jailbreaking

Convincing an AI model to bypass its safety guardrails. Different from prompt injection (which targets the system) — jailbreaking targets the model’s behaviour. Relevant for anyone deploying AI-powered products.

Model Theft & Extraction

Stealing a proprietary model’s capabilities by systematically querying it and training a copy. This threatens the business model of frontier labs and the competitive advantage of any company with custom models.

Threats Targeting Data

Data Poisoning

Corrupting the training data so the model learns wrong things. Could be subtle (bias injection) or catastrophic (backdoor that activates on a trigger). Relevant for any organisation fine-tuning or training models.

Privacy & PII Leakage

Models can memorise and regurgitate training data — including personal information, code secrets, and internal documents. If your data went into training, it might come back out. This intersects directly with GDPR & AI.

Supply Chain Attacks

The AI stack has many dependencies: models, datasets, frameworks, plugins. A compromised dependency (a poisoned model on Hugging Face, a malicious LangChain plugin) can compromise everything downstream.

What You Can Do

For Individuals

Verify unusual requests — If someone calls asking for money (even in a familiar voice), hang up and call back on a known number
Question AI-generated content — Assume any image, video, or audio could be synthetic until verified
Use multi-factor authentication — Voice alone should never be a security factor
Stay informed — The scam techniques evolve fast. See AI Scams & Social Engineering

For Businesses

Treat AI as an attack surface — Every AI-powered feature is a potential entry point
Red team your AI systems — Test for prompt injection, data leakage, and unexpected behaviours
Implement guardrails — Input validation, output filtering, human approval for consequential actions
Know your regulatory obligations — The EU AI Act requires risk assessment. ISO 42001 provides a framework. GDPR governs personal data in AI.

For Developers

Never trust user input — This is web security 101, and it applies doubly to AI
Separate instructions from data — Don’t let user content and system prompts mix
Monitor and log — You need observability on what your AI is doing
Keep humans in the loop — For anything consequential
See our developer security guide for a structured approach

The Regulatory Response

Governments are waking up to AI security threats:

The EU AI Act classifies AI systems by risk level and mandates security requirements for high-risk systems
GDPR already applies to AI processing of personal data
The US Executive Order on AI established reporting requirements for frontier models
The UK’s AI Safety Institute focuses on evaluating frontier model safety
ISO 42001 provides a certifiable AI management system standard

But regulation is reactive. The threats move faster than the law. Individual awareness and organisational discipline matter as much as compliance.

Go Deeper

Deepfakes — The deepfake problem in detail
AI Scams & Social Engineering — How AI-powered scams work
Prompt Injection — The critical system vulnerability
AI Safety Courses — Structured learning paths for security
AI Safety & Ethics — The broader ethical context
Legal & Compliance — What the law requires
Court Rulings — How courts are handling AI harms
AI Intelligence Hub — Back to the hub home

Sources

OWASP Top 10 for LLMs — Industry-standard LLM vulnerability list
NIST AI Risk Management Framework — US framework
ENISA AI Threat Landscape — EU cybersecurity agency
AI Incident Database — Tracked AI failures and harms