AI Security
AI Security
The philosophical questions about AI alignment matter. But right now, today, people are being scammed by cloned voices, manipulated by deepfake videos, and having their systems compromised through prompt injection attacks.
This section is about practical AI security — the threats that exist right now, how they work, and what you can do about them. Whether you’re an individual protecting yourself, a business deploying AI, or a developer building with it, the attack surface is real and growing.
The Threat Landscape
flowchart TD
Threats[AI Security Threats] --> People[Targeting People]
Threats --> Systems[Targeting AI Systems]
Threats --> Data[Targeting Data]
People --> Deepfakes[Deepfakes]
People --> Scams[AI Scams & Social Engineering]
People --> Disinfo[Disinformation]
Systems --> PromptInj[Prompt Injection]
Systems --> Jailbreak[Jailbreaking]
Systems --> ModelTheft[Model Theft & Extraction]
Data --> Poisoning[Data Poisoning]
Data --> Privacy[Privacy & PII Leakage]
Data --> SupplyChain[Supply Chain Attacks] Threats Targeting People
Deepfakes
AI-generated fake images, video, and audio of real people. Used for fraud, political manipulation, non-consensual content, and evidence fabrication. The technology to create them is accessible. The technology to detect them is lagging behind.
AI Scams & Social Engineering
Voice cloning for phone scams. AI-generated phishing emails that are grammatically perfect and personally tailored. Fake customer service chatbots. The scale and sophistication of social engineering has jumped dramatically.
Disinformation
AI makes it trivially easy to generate convincing fake news articles, social media posts, and propaganda at scale. The cost of creating disinformation has dropped to near zero.
Threats Targeting AI Systems
Prompt Injection
The most important AI security vulnerability. An attacker embeds instructions in data that the AI processes, causing it to ignore its real instructions and follow the attacker’s instead. If AI is processing any untrusted input — and it almost always is — prompt injection is a risk.
Jailbreaking
Convincing an AI model to bypass its safety guardrails. Different from prompt injection (which targets the system) — jailbreaking targets the model’s behaviour. Relevant for anyone deploying AI-powered products.
Model Theft & Extraction
Stealing a proprietary model’s capabilities by systematically querying it and training a copy. This threatens the business model of frontier labs and the competitive advantage of any company with custom models.
Threats Targeting Data
Data Poisoning
Corrupting the training data so the model learns wrong things. Could be subtle (bias injection) or catastrophic (backdoor that activates on a trigger). Relevant for any organisation fine-tuning or training models.
Privacy & PII Leakage
Models can memorise and regurgitate training data — including personal information, code secrets, and internal documents. If your data went into training, it might come back out. This intersects directly with GDPR & AI.
Supply Chain Attacks
The AI stack has many dependencies: models, datasets, frameworks, plugins. A compromised dependency (a poisoned model on Hugging Face, a malicious LangChain plugin) can compromise everything downstream.
What You Can Do
For Individuals
- Verify unusual requests — If someone calls asking for money (even in a familiar voice), hang up and call back on a known number
- Question AI-generated content — Assume any image, video, or audio could be synthetic until verified
- Use multi-factor authentication — Voice alone should never be a security factor
- Stay informed — The scam techniques evolve fast. See AI Scams & Social Engineering
For Businesses
- Treat AI as an attack surface — Every AI-powered feature is a potential entry point
- Red team your AI systems — Test for prompt injection, data leakage, and unexpected behaviours
- Implement guardrails — Input validation, output filtering, human approval for consequential actions
- Know your regulatory obligations — The EU AI Act requires risk assessment. ISO 42001 provides a framework. GDPR governs personal data in AI.
For Developers
- Never trust user input — This is web security 101, and it applies doubly to AI
- Separate instructions from data — Don’t let user content and system prompts mix
- Monitor and log — You need observability on what your AI is doing
- Keep humans in the loop — For anything consequential
- See our developer security guide for a structured approach
The Regulatory Response
Governments are waking up to AI security threats:
- The EU AI Act classifies AI systems by risk level and mandates security requirements for high-risk systems
- GDPR already applies to AI processing of personal data
- The US Executive Order on AI established reporting requirements for frontier models
- The UK’s AI Safety Institute focuses on evaluating frontier model safety
- ISO 42001 provides a certifiable AI management system standard
But regulation is reactive. The threats move faster than the law. Individual awareness and organisational discipline matter as much as compliance.
Go Deeper
- Deepfakes — The deepfake problem in detail
- AI Scams & Social Engineering — How AI-powered scams work
- Prompt Injection — The critical system vulnerability
- AI Safety Courses — Structured learning paths for security
- AI Safety & Ethics — The broader ethical context
- Legal & Compliance — What the law requires
- Court Rulings — How courts are handling AI harms
- AI Intelligence Hub — Back to the hub home
Sources
- OWASP Top 10 for LLMs — Industry-standard LLM vulnerability list
- NIST AI Risk Management Framework — US framework
- ENISA AI Threat Landscape — EU cybersecurity agency
- AI Incident Database — Tracked AI failures and harms