AI has a serious copyright problem, but fortunately a fixable one

The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now

By Tim O’Reilly, Mike Loukides | Feb 01, 2024

Image courtesy of and under license from Shutterstock.com

GenAI is known for copyright violations. Original content creators cry foul as GenAI developers argue piracy is indispensable to train models. RAG offers a way out of this impasse. Mike Loukides, VP for O’Reilly Media, and Tim O’Reilly, the firm’s founder and CEO, lay it all out.

SEBASTOPOL, CALIFORNIA - Generative artificial intelligence (GenAI) stretches current copyright law in unforeseen and uncomfortable ways. The United States Copyright Office recently issued guidance stating the output of image-generating AI is not copyrightable unless human creativity went into the prompts that generated it. This pronouncement leaves more questions than it answers: How much creativity is needed? Is it of the same kind an artist exercises with a paintbrush?

Another group of cases deal with text - typically novels and novelists - where some argue training a model on copyrighted material is itself infringement, even if the model never reproduces these texts in its output. The problem with this argument is, reading texts has been part of the human learning process for as long as written language has existed. While people pay to buy books, they do not pay to learn from them.

How should one make sense of this? What should copyright law mean in the age of artificial intelligence (AI)? Data dignity, which implicitly distinguishes between training - or ‘teaching’ - a model and generating output using a model, is one answer technologist Jaron Lanier offers. The former should be a protected activity, Lanier argues, whereas output could indeed infringe someone’s copyright.

This distinction is attractive for several reasons. First, current

The content herein is subject to copyright by Project Syndicate. All rights reserved. The content of the services is owned or licensed to The Yuan. The copying or storing of any content for anything other than personal use is expressly prohibited without prior written permission from The Yuan, or the copyright holder identified in the copyright notice contained in the content.