NYT v OpenAI
NYT v OpenAI
The case that could define the economics of the entire AI industry. The New York Times is suing OpenAI and Microsoft for training GPT models on its copyrighted journalism without permission.
If the NYT wins, every AI company that trained on internet data faces potential liability. If OpenAI wins, it establishes that AI training is fair use. Either way, the answer reshapes the industry.
The Facts
In December 2023, The New York Times filed suit against OpenAI and Microsoft in federal court in New York. The core allegations:
- OpenAI used millions of NYT articles to train GPT models without permission or payment
- ChatGPT can reproduce NYT content nearly verbatim when prompted
- This constitutes copyright infringement at massive scale
- It also undermines the NYT’s business model (why subscribe if the AI has the content?)
The NYT showed examples where ChatGPT reproduced paragraphs of NYT reporting word-for-word — a powerful demonstration that the models had memorised, not just “learned from,” the training data.
The Question
Is training an AI model on copyrighted content “fair use” under US copyright law?
Fair use is the US legal doctrine that permits limited use of copyrighted material without permission for purposes like criticism, education, and research. The four-factor test:
- Purpose — Is the new use “transformative”? (Does it create something new, or just copy?)
- Nature of the original — Is the original creative or factual?
- Amount used — How much of the original was taken?
- Market effect — Does the new use harm the market for the original?
OpenAI argues AI training is transformative. The NYT argues it’s wholesale copying that undermines their business.
Where It Stands
The case is ongoing. Key developments:
- OpenAI has moved to dismiss parts of the case
- The court has allowed the core copyright claims to proceed
- Discovery (exchange of evidence) is underway
- Settlement discussions have been reported but no agreement reached
- Trial date not yet set
Why It Matters
For AI companies: If training on copyrighted data is not fair use, the economic foundation of current AI models collapses. Every company would need to license training data — at costs that could be prohibitive.
For publishers: If it is fair use, publishers lose control of their content to AI companies that capture the economic value without compensation.
For the industry: The answer determines whether the current open-internet training paradigm continues, or whether AI development shifts to licensed, synthetic, and public-domain data.
For you: The outcome affects what AI can know, how much it costs, and who profits from information.
The Bigger Picture
This isn’t the only copyright case — Getty v Stability AI tests similar questions for images, and hundreds of authors have filed suits — but NYT v OpenAI is the flagship. It has the most resources on both sides, the clearest factual record, and the highest profile.
The EU AI Act takes a different approach: it requires AI companies to disclose what they trained on, giving rights holders information to enforce their rights. The US approach is litigation-first.
See AI Models for context on how training data shapes model capabilities.
Go Deeper
- Court Rulings — All tracked cases
- Getty v Stability AI — The image generation copyright case
- OpenAI — The company at the centre
- Training & Fine-Tuning — How AI training works (and why it needs so much data)
- EU AI Act — Europe’s regulatory approach to training data transparency
- Legal & Compliance — The full legal landscape
- AI Intelligence Hub — Back to the hub home
Sources
- NYT Original Complaint (PDF) — Primary source
- Reuters — Case Tracking — Ongoing coverage
- EFF — AI and Copyright — Digital rights perspective