The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now
AI has a serious copyright problem, but fortunately a fixable one
By Tim O’Reilly, Mike Loukides  |  Feb 01, 2024
AI has a serious copyright problem, but fortunately a fixable one
Image courtesy of and under license from
GenAI is known for copyright violations. Original content creators cry foul as GenAI developers argue piracy is indispensable to train models. RAG offers a way out of this impasse. Mike Loukides, VP for O’Reilly Media, and Tim O’Reilly, the firm’s founder and CEO, lay it all out.

SEBASTOPOL, CALIFORNIA - Generative artificial intelligence (GenAI) stretches current copyright law in unforeseen and uncomfortable ways. The United States Copyright Office recently issued guidance stating the output of image-generating AI is not copyrightable unless human creativity went into the prompts that generated it. This pronouncement leaves more questions than it answers: How much creativity is needed? Is it of the same kind an artist exercises with a paintbrush? 

Another group of cases deal with text - typically novels and novelists - where some argue training a model on copyrighted material is itself infringement, even if the model never reproduces these texts in its output. The problem with this argument is, reading texts has been part of the human learning process for as long as written language has existed. While people pay to buy books, they do not pay to learn from them.

How should one make sense of this? What should copyright law mean in the age of artificial intelligence (AI)? Data dignity, which implicitly distinguishes between training - or ‘teaching’ - a model and generating output using a model, is one answer technologist Jaron Lanier offers. The former should be a protected activity, Lanier argues, whereas output could indeed infringe someone’s copyright.

This distinction is attractive for several reasons. First, current 

The content herein is subject to copyright by Project Syndicate. All rights reserved. The content of the services is owned or licensed to The Yuan. The copying or storing of any content for anything other than personal use is expressly prohibited without prior written permission from The Yuan, or the copyright holder identified in the copyright notice contained in the content.
Share your thoughts.
The Yuan wants to hear your voice. We welcome your on-topic commentary, critique, and expertise. All comments are moderated for civility.